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Speech  quality  measurement  is  considered  from  three  points  of 
view:  subjective  testing,  objective  testing,  conmunicability  testing. 

Speech  quality  is  interpreted  here  in  terms  of  user  acceptability.  It 
is  assumed  that  good  intelligibility  is  always  present  since  otherwise  a 
s'-stem  is  of  no  interest  here. 

Subjective  testing  is  considered  from  the  philosophical  perspective 
of  iso-pref erence , relative  preference,  and  absolute-preference,  with 
isometric  and  parametric  test  methodologies,  with  the  results  of  PARM  and 
QUART  as  a basis.  It  is  felt  that  the  best  approach  for  future  subjective 
testing  will  be  parametric  approach  using  representative  male  and  female 
talkers  to  cover  the  expected  range  of  pitch.  An  automated  and  refined 
version  of  Voiers  Diagnostic  Acceptability  Measure  (DAM)  test  is  an 
attractive  option. 

Objective  testing  is  considered  as  a possible  alternative  to  subjective 
testing.  Reported  here  is  a two  part  experimental  study  of  the 
relationship  between  a number  of  objective  measures  and  the  subjective 
acceptability  measures  available  from  the  PARM  study.  In  the  first  part 
of  the  study,  controlled  distortions  were  applied  to  speech  samples  in 
order  to  measure  the  resolving  power  of  the  candidate  objective  measures 
on  these  types  of  distortions.  In  the  second  part,  the  candidate 
objective  measures  were  applied  to  speech  samples  from  the  same  systems 
on  which  PARM  tests  were  run,  and  the  statistical  correlation  between  the 
objective  and  subjective  measures  were  studied.  Objective  measures 
examined  include  spectral  distance  measures:  Several  LPC  based  spectral 
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distances,  LPC  error  power  ratio,  and  cepstral  distance;  as  well  as 
pitch  comparison  measures,  and  noise  power  measures.  Controlled 
distortions  were  formant  bandwidth,  frequency,  pitch,  low-pass  bandwidth, 
and  additive  noise.  Correlations  with  subjective  test  data  range  from 
~ 0.2  to  - 0.8. 

In  the  communicability  test,  a somewhat  differen'  point  of  view  is 
taken.  The  user  is  oxpocted  to  perform  on  the  data  some  cognitive 
task  which  is  measurable.  Hie  rationale  here  is  that  the  user  will  be 
better  able  to  perform  if  the  quality  is  high,  than  if  his  cognitive 
resource,  assumed  fixed,  is  saturated  due  to  poorer  quality  transmission. 
The  test  format  chosen  for  this  study  was  a multiple  digit  recall  te9t 
':nilnr  to  that  studied  at  Bell  Labs  by  Naahrani.  In  this  format  sequences 
of  random  digi.fs  are  first  recorded  by  trained  speakers,  and  then  these 
utterances  are  played  through  various  distorting  systems.  The  resulting 
sequences  are  then  played  to  subjects  whose  task  is  to  "recall"  the 
digits  after  a short  1 second)  wait.  These  tests  prove  to  be  rather 
unpleasant  to  take,  and  require  larger  numbers  of  subjects,  but  will 
differentiate  among  distorting  systems. 


iv 


TABLE  OK  CONTENTS 


PaRe  # 

Foreward  1 

Abstract  ill 

List  of  Figures  lx 

List  of  Tables  xli 

I . INTRODUCTION  1 

1.1  - Task  History  1 

1.2  - Spe  ech  Digitization  Systems  and  Testing 

Requirements  1 

1.3  - Personnel,  Procedures,  and  Facilities  2 

1.4  - Technical  Organization  2 

1.5  - Organization  of  the  Report  4 

II.  OBJECTIVE  MEASURES  FOR  SPEECH  QUALITY  5 

2.1  - Introduction  5 

2.2  - The  Choice  of  Objective  Measures  8 

2.2.1  - The  Speech  Perception  Process  8 

2.2.2  - Specific  Objective  Quality  Measures  11 

2. 2. 2.1  - Spectral  Distance  Measures  11 

2. 2. 2. 1.1  - The  LPC  Spectral  Distance 

Measures  13 

2. 2. 2. 1.2  - Cepstral  Spectral  Distance 

Measures  17 

2. 2. 2. 2 - Excitation  Feature  Extraction  19 


2. 2.2.3 


Noise  Power  Measures 


21 


Pa  Re  # 


2.3  - Initial  Qualitative  Studies  and  24 

Controlled  Distortions 

2.3.1  ••  Qualitative  Studies  24 

2.3.2  - The  Controlled  Distortion 

Experiment  37 

2.3.2. 1 - Bandwidth  Distortion  37 

2. 3. 2. 2 - Frequency  Distortion  37 

2. 3. 2. 3 - Pitch  Distortion  38 

2. 3. 2. 4 - Low  Pass  Filter  Distortion  38 

2. 3. 2. 5 - Additive  White  Noise  Distortion  38 

2.3.3  - The  Experimental  Results  38 

2.3.  3.1  - Results  of  the  Vowel  Tests  40 

2. 3. 3. 2 - Results  of  the  Sentence  Tests  49 

2.4  - The  FARM  (''T'-elation  Study  49 

2.4.1  - The  PARM  Data  Base  58 

2.4.2  - The  Statistical  Analysis  61 

2.4.3  - The  Experimental  Results  67 

2.5  - Summary  and  Areas  for  Future  Research  71 

References  73 

III.  SUBJECTIVE  PREDICTION  OF  USER  PREFERENCE  76 

3.1  - Introduction  76 

3.2  - Subjective  Testing  Philosophies  77 

3.3  - Statistical  Testing  Procedures  79 

3.3.1  - Distribution  80 

3.3.2  - Estimation  83 


/ 


vi 


Page  # 


3.3.3  - Analysis  of  PAKM  Data  86 

3.3.4  - Nonparametric  Tests  88 

3.4  - Conclusions  and  Reconmendations  89 

3.4.1  - Isometric  Tests  89 

3.4.2  - Tests  of  Features  90 

3.4.3  - Implementation  of  Subjective  Tests  91 

3.4.4  - Size  of  the  Test  91 

3.4.5  - Speaker  Selection  91 

3.4.6  - Overall  Recommendations  for 

Subjective  Tests  92 

References  93 

Selected  Bibliography  in  Speech  Quality  Testing  94 

IV.  A SUBJECTIVE  COMMUNICABILITY  TEST  97 

4.1  - Introduction  97 

4.2  - An  Automated  Speech  Subjective  Quality 

Testing  Facility  99 

4.3  - The  Experimental  Fo.mat  103 

4.4  - The  Data  Analysis  104 

4.5  - The  Experimental  Results  106 

4.6  - Conclusions  109 

APPENDIX  A - SPEECH  ACCEPTABILITY  EVALUATION  114 

AT  DYNASTAT:  THE  DIAGNOSTIC  ACCEPTABILITY 
MEASURE  (DAM) 

APPENDIX  B - DERIVATION  OF  THE  PROBABILITY  DENSITY  123 

FUNCTION  FOR  THE  STUDENTIZED  RANGE 
STATISTIC 

APPENDIX  C - MINICOMPUTER  BASED  DIGITAL  SIGNAL  127 

PROCESSING  LABORATORY 


vi< 


APPENDIX  D 


- SO FI V ARK  SUMMARY 


Page  # 
148 


viii 


LIST  OF  FIGURES 


Chapter  1 

Figure 

Chapter  2 

Figure 

Figure 

Figure 

Figure 

Figure 

Figure 

Figure 

Figure 

Figure 

Figure 

Figure 

Figure 

Figure 

Figure 


Pa^e  * 


1.1  - The  Basic  System  for  the  Research 

Laboratory  3 

2.1  - System  to  do  LPC  Spectrum  Analysis  14 

2.2  - System  for  Computing  the  "Error  Power 

Ratio"  Measure  ’4 

2.3  - System  for  Performing  Cepstral 

Deconvolution  for  Spectral  Envelope 
Analysis  18 

2.4  - Calculation  of  the  Cepstral 

Pitch  Metric  22 

2.5  - LPC  Spectrogram  of  Htanchor  (LLl)  27 

2.6  - LPC  Spectrogram  of  CVSD  at  9.6 

KBPS  (LLl)  28 

2.7  - LPC  Spectrogram  of  CVSD  at  16 

KBPS  (LLl)  29 

2.8  - Spectrogram  of  HY2  at  2.4  KBPS 

(LLl)  30 

2.9  - LPC  Spectrogram  of  Longbrake  at 

2.4  KBPS  (LLl)  31 

2.10  - Cepstral  Spectrogram  of  Htanchor 

(LLl)  32 

2.11  - Cepstral  Spectrogram  of  CVSD  at 

9.6  KBPS  (LLl)  33 

2.12  - Cepstral  Spectrogram  of  CVSD  at 

16  KBPS  (LLl)  34 

2.13  - Cepstral  Spectrogram  of  HY2  at  2.4 

KBPS  (LLl)  35 

2.14  - Cepstral  Spectrogram  of  Longbrake 

at  2.'+  KBPS  (LLl)  36 


lx 


Page  # 


Figure  2.15  - Plots  of  d^  log  LPC  Spectral 

Distance  Measures  for  the  Synthetic 
Vowel  for  Various  Bandwidth 
Distortion  Factors.  The  Distortion 
is  Formed  from  a^€ora.  where  a is 
the  Bandwidth  Distortion  Factor 

Figure  2.16  - Plots  of  Log  LPC  Spectral 

Distance  Measures  for  the  Synthetic 
Vowel  | a;  | for  Various  Frequency 
Shift  Distortion  Ratios 


Figure  2.17(a)  - 


Figure  2.17(b)  - 


Plots  of  the  d^  Log  LPC 
Spectral  Distance  Measure  on 
Vowels  for  the  Various  Distortions 
Used  in  Lhis  Study 

Plots  of  the  d£  Log  LPC 
Spectral  Distance  Measures  on 
Sentences  for  the  Various 
Distortions  Used  in  this  Stud” 


Figure  2.18 


Figure  2.19  - 


Figure  2.20 


Figure  2.21  - 


Figure  2.22 


Ccpstral  Pitch  Metric  as  a 
Function  of  Time  for  Four  Different 
Pitch  Distortions  for  Window  No.  1 
(Figure  2.3).  Window  Length  - 1 50 

Ccpstral  Pitch  Metric  as  a Function 
of  Time  for  Four  Different  Pitch 
Distortions  for  Window  No.  2 (Figure 
2.3).  Window  Length  “4  51 

Ccpstral  Pitch  Metric  as  a Function 
of  Time  for  Four  Different  Pitch 
Dis portions  for  Window  No.  3 (Figure 
2.3).  Window  Length  ■ 1C  52 

Ccpstral  Pitch  Metric  as  a Function 
of  Time  for  Four  Different  Distortions 
for  Window  No.  4 (Figure  2.3). 

Window  Length  *=  10  53 

Layout  of  PARM  Access  Data  Used  as 

Part  of  this  Study.  Each  Box 

Represents  a Disk  File.  The  Data 

Is  Presorted  in  the  Data  Files  to 

allow  Easy  Access  of  the  PARM  Data 

Seta  60 


x 


Page  tf 


Chapter  3 

Figure  3.1  - Generation  of  Studentized  Range  84 

Statist  I c 

Cliaptor  4 

Figure  4.1  - Quality  Station  100 

Figure.  4.2  - An  Example  "QUALUOL"  Program  used 
to  Administer  the  Communicability 
Tests  102 

Appendix  A 

Figure  1 - Specimen  Printout  of  DAM  Results  120 

Appendix  C 

Figure  1 - The  basic  System  for  the  Research 

Laboratory  124 

Figure  2 - The  NOVA  830  Graphical  Subsystem  132 

Figure  3 - The  Audio  Subsystem  on  the  NOVA  830  134 

Figure  4 - The  Speech  Quality  Testing  Subsystem  13b 

Figure  5 - The  Optical  Data  Processing  Subsystem  137 

Figure  6 - The  Computer  Network  Subsystem  139 

Figure  7 - The  Universal  Card  Tester  140 

Figure  8 - The  NOVA  820  basic  System  144 

Figure  9 - 'Ihe  Micro-Processor  Subsystem  147 


xl 


LIST  OF  TABLES 


Chapter  2 

Table  2. 

Table  2. 

Table  2. 

Table  2. 

Table  2. 

Table  2. 

Table  2. 

Table  2. 

Table  2. 

Tabic  2. 


1 Statistics  Calculated  by  "PCllECK" 

Pitch  Comparison  Program  20 

2 Input  Sentences  Used  in  the 

Initial  Qualitative  Studies  25 

3 Results  of  the  Bandwidth 
Distortions  and  Frequency  Shift 
Distortions  on  Vowels.  All 
Confidence  Intervals  are  at  the 

.05  Level  41 

4 Results  of  the  Bandlimit  Distortion 
anvl  Additive  Noise  Distortion  on 
Vowe 1 9 . All  Confidence  Intervals 

are  at  the  .05  Level  42 

5 Results  of  the  Pitch  Distortions 

on  Vowels.  Note  that  the  Distortions 

are  Low,  and  Increase  Distortions 

Cause  No  Increase  in  the  Measures  48 

6 Results  of  the  Bandwidth  Distortions 
and  Frequency  Shift  Distortions  on 
Sentences.  All  Confidence  Intervals 

arc  at  the  .05  Levels  54 

7 Results  of  the  Bandlimit  Distortions 
and  Additive  Noise  Distortion  on 
Sentences.  All  Confidence  Intervals 

are  at  the  .05  Significance  Level  55 

8 Results  of  tho  Pitch  Distortion  Study 

on  Vowels.  All  Confidence  Intervals  are 
at  the  .05  Significance  Level  5b 

9 Comparison  of  Gain  Weighted  Dj  Log 

LPC  Spectral  Metrics  to  Non-Cain  Weighted 
D^  Log  LPC  Spectral  Metrics  57 

10  System  Used  in  the  TARM  Correlation 

Study  59 


*11 


Pane 


Table 

2.11 

Objective  Measures  Used  in  the 

PARM  Correlation  Study 

62 

Table 

2.12 

Results  of  Correlation  Study  for 

Total  Set  of  Systems 

68 

Table 

2.13 

Results  of  Correlation  Study  Using 

Only  Vocoders 

69 

Table 

2.14 

Results  of  Waveform  Coder  Using  Only 
Waveform  Coders 

70 

Chapter  4 

Table 

4.1  QUALGOL  Language 

101 

Table 

4.5.1 

Results  of  Unscreened  First 

Scoring  Tests 

107 

Table 

4.3.2 

Distortion  Levels  for  the  Test 

Digits  on  the  Ilireo  Communicablli t y 
Tests 

108 

Table 

4,5.3 

h-sults  of  Screened  KirRt  Scoring 
Testa 

110 

Table 

4.5.4 

Results  of  the  Screenod  Tesla  Using 
the  Second  Scoring  Method 

111 

Table 

4.5.3 

Results  of  Screened  Tests  Using  the 
Third  Scoring  Method 

112 

Appendix  A 

Tabic 

1 System  Charctet istlcs  Evaluated  by  DAM 

118 

Appendix  C 

Tflbdr.  1 1/0  Devices  on  the  NOVA  830  I/O  Buhs  130 


Table  2 Standard  PC  Cards  Used  in  the  Modular 

Construct  Ion  System  142 

Table  3 I/O  Devices  on  the  NOVA  820  I/O  Buss  143 


xlil 


CHAPTER  1 


INTRODUCTION 


1. 1 Task  History 

The  engineering  effort  reported  on  here  was  performed  at  Georgia 
Institute  of  Technology  in  the  School  of  Electrical  Engineering  fcr 
the  Defense  Communications  Agency  through  the  Rome  Air  Development 
Center  Post-Doctoral  Program.  The  Post-Doctoral  Program  is  under  the 
direction  of  Mr.  Jake  Scherer.  The  monitoring  officer  at  the  Defense 
Communications  Engineering  Center  was  Dr.  William  R.  Belfield,  at  the 
Defense  Coiranunications  Engineering  Center  (DCEC) . 

This  task,  an  investigation  of  subjective  speech  quality  testing, 
objective  speech  quality  testing,  and  conniunicability  testing,  was 
undertaken  following  the  development  at  DCEC  of  a large  data  base 
associated  with  PARM  and  QUART  (Paired  Acceptability  Rating  Method  and 
Quality  Acceptance  Rating  Test) . The  existence  of  this  data  base  has 
made  possible  the  detailed  analysis  of  subjecti  ve  testi  -.g  procedures, 
objective  testing  methods,  and  conmunicability  testing,  with  good 
cross  checking  and  validity  referencing  of  results. 

1.2  Speech  Digitization  Systems  and  Testing  Requirements 

Since  it  haB  for  some  years  been  clear  that  some  form  of  end- 
to-end  speech  digitization  would  be  initiated  in  the  Defense  Communica- 
tion Systems,  a number  of  speech  digitization  systems  have  been  developed 
in  various  laboratories  around  the  country.  The  job  of  selecting  from 
these  candidate  systems  the  features  to  be  included  in  a final  system 
requires  extensive  evaluation  and  testing  to  be  conducted.  When  a 
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"final"  system  is  fielded,  periodic  field  testing  of  all  links  for 
continued  operational  quality  will  be  a significant  requirement.  This 
study  attempts  to  further  focus  efficient  means  for  developmental  and 
operational  quality  testing. 

1 . 3 Personnel,  Procedures,  and  Facilities 

This  task  has  been  carried  out  principally  by  Dr.  T.  P. 

Barnwell,  with  Dr.  A.  M.  Bush,  and  with  the  active  involvement 
of  Dr.  R.  W.  Schafer  and  Dr.  R.  M.  Mersereau.  Student  Assistants  have 
included  Mr.  Ashfaq  Arastu,  Mr.  Bartow  Willingham,  and  Mr.  J.  D.  Marr 
here  at  Georgia  Tech.  This  group  also  consulted  on  two  occasions  with 
Dr.  w.  n.  Voiers  of  Dynastat,  Inc.,  Austin,  TX.  The  project  was  done 
for  and  with  the  active  help  of  Dr.  William  R.  Belfield  of  the  Defense 
Communications  Engineering  Center. 

Team  loader  was  Dr.  T.  P.  Barnwell.  Ttie  project  was  initiated 
in  May  1976  and  completed  in  May  1977.  Although  six  months  effort  was 
originally  estimated,  unavoidable  delays  in  establishing  the  PARM  data 
base  at  Georgia  Tech  delayed  its  progress.  This  report  was  prepared 
at  Georgia  Tech,  tentatively  approved  In  rough  draft  form  at  DCED,  and 
subsequently  reproduced  at  Georgia  Tech. 

This  work  was  carried  out  in  the  School  of  Electrical  Engineering 
Digital  Signal  Processing  Facility.  A block  diagram  is  given  as 
Figure  1,1.  A more  detailed  description  of  the  facility  is  given  in 
Appendix  C. 

A 

1 . 4 Technical  Organization 

The  work  reported  here  had  as  its  ultimate  goal  the  development 
of  efficient  objective  methods  and  tests  for  predicting  user  acceptance 
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of  digital  speech  transmission  systems.  Three  phases  of  the  attack  on 
tills  goal  were  established:  (a)  summary  investigation  of  subjective 
testing  methods;  (b)  development  of  a communicability  test  procedure; 
(c)  development  of  objective  testing  procedures. 

The  outputs  of  the  study  are  recommendations  for  future 
subjective  test  organization  and  implementation,  specification  of  an 
objective  testing  procedure  with  cross-validation  against  FARM  sub- 
jective testing  results,  specification  of  a communicability  tent 
philosophy  and  implementation  of  the  test  with  results  analyzed 
statistically.  A secondary  output  is  the  FARM  data  base  now  organized 
for  efficient  searches. 

Work  progressed  in  all  three  phases  in  parallel,  with  some  un- 
expected delays  due  to  the  time  required  to  obtain  and  organize  the 
data  base  from  FARM  (this  is  a large  data  base).  A.  M.  Bush  took 
principal  responsibility  for  the  subective  testing  portion,  and  T. 

F.  Barnwell  was  principally  responsible  for  the  objective  test  and  the 
communicability.  K.  W.  Schafer  and  R.  M.  Mersereau  also  contributed 
to  all  three  phases  of  the  effort. 

1 .  h Organization  of  the  Report 

The  detailed  aspects  of  each  of  the  three  phases  of  the  effort 
are  presented  in  the  report  with  the  objective  testing  study  in  Chapter 

2,  the  subjective  testing  study  in  Chapter  3,  and  the  communicability 
tost  in  Chapter  -l.  Kach  chapter  is  headed  by  an  .introduction  giving 
the  philosophy  and  rationale  for  that  phase  of  the  work  and  the 
technical  perspective  required  for  that  phase. 


II.  OBJECTIVE  MEASURES  FOR  SPEECH  QUALITY 


2 . 1 Introduction 

In  recent  years,  considerable  effort  has  been  devoted  to  the 
development  and  implementation  of  efficient  algorithms  for  digitally 
encoding  speech  signals.  These  algorithms,  which  are  utilized 
chiefly  in  digital  communications  systems  and  digital  storage  systems, 
cover  a wide  range  of  techniques,  and  result  in  systems  which  vary 
greatly  in  cost,  complexity,  data  rate,  and  quality.  Generally 
speaking,  modern  speech  digitization  systems  can  be  divided  into  four 
categories:  high  rate  systems  which  operate  from  ~ 100  KBPs  to 

~ 32  KBPs;  intermediate  rate  systems  which  operate  from  ~ 32  KBPs  to 
~ 8 KBPs i low  rate  systems  which  operate  from  ~ 8 KBPs  to  ~ 1 KBPs;  and 
very  low  rate  systems  which  operate  below  ~ 1 KBPs.  In  the  high  rate 
systems,  PCM  12.1]  and  adaptive  PCM  (2.2)  are  of  the  predominant  tech- 
niques. In  the  intermediate  rate  systems,  the  techniques  are  more  varied, 
including  DM  12.3],  ADM  [2.4]  {2. 5],  DPCM  [2.6],  ADPCM  [2.7],  APC  [2.8], 
and  adaptive  transform  coding  [2.9].  The  low  rate  systems  consist  mostly 
of  the  vocoder  techniques,  including  LPC  [2.10-2.13],  channel  vocoders 
[2. 14]  [2. 15] , phase  vocoders  [2. 20] [2. 21) , and  several  other  techniques 
[2.22].  Very  low  rate  systems  usually  involve  feature  extraction  on  a 
perceptual  or  linguistic  level,  and,  thus  far,  very  few  systems  of  this 
type  have  been  implemented.  As  a general  rule,  the  higher  data  rate 
systems  are  less  expensive  to  implement  and  less  sensitive  to  bit 
errors,  while  the  lower  rate  systems  require  more  expensive  terminals, 
and  result  in  greater  distortions  in  the  presence  of  errors. 
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The  problem  of  rating  and  comparing  these  systems  from  the 
standpoint  of  user  acceptance  is  a difficult  one,  particularly  since  ■ * 
the  candidate  systems  are  usually  highly  intelligible.  Hence,  intelli- 
gibility tests,  such  as  the  DRT  [2.23],  may  not  suffice  to  resolve  small 
differences  in  acceptability.  Direct  user  preference  tests  such  as 
the  PARM  [2.24]  have  been  found  useful  for  this  purpose  but  are  not  highly 
cost  effective.  Moreover,  they  provide  no  diagnostic  information  which 
could  be  of  value  in  remedying  the  deficiencies  of  systems  being  tested. 

Objective  measures  which  can  be  computed  from  sample  speech 
materials  offer  a possible  alternative  to  subjective  acceptability 
measures.  It  should  be  noted,  however,  that  the  perception  of  speech 
is  a highly  complex  process  involving  not  only  the  entire  grammar  and 
the  resulting  syntactic  structure  of  the  language,  but  also  such 
diverse  factors  as  semantic  context,  the  speaker's  attitude  and  emotional 
state,  and  the  characteristics  of  the  human  auditory  system.  Hence,  the 
development  of  a generally  applicable  algorithm  for  the  prediction  of 
user  reactions  to  any  speech  distortion  must  await  the  results  of 
future  research.  However,  the  effects  of  certain  classes  of  distortion 
are  potentially  predictable  on  the  basis  of  present  knowledge.  In 
particular,  substantial  progress  has  been  made  in  quantifying  the 
importance  of  such  acoustic  features  as  pitch,  intensity,  spectral 
fidelity,  and  speech/noise  ratio  to  the  intelligibility,  speaker 
recognizability  as  well  as  the  overall  acceptability  of  the  received 
speech  signal.  Thus  far,  little  success  has  accompanied  efforts  to 
predict  the  subjective  consequences  of  other  than  relatively  simple 
forms  of  signal  degradation,  but  recent  developments  in  digital  signal 
processing  techniques  [2 . 25]  [2 . 26] , suggest  a number  of  efficient  objective 
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measures  which  could  be  highly  correlated  with  user  acceptability. 

In  a recent  study  conducted  by  the  Defense  Department  Consortium 
on  speech  quality,  a large  number  of  speech  digitization  systems  were 
subjectively  tested  using  the  Paired  Acceptability  Rating  Method  (PARM) 
Test  12.24]  developed  at  the  Dynastat  Corporation.  The  systems  tested 
included  a representative  cross-section  of  the  intermediate  rate  and 
low  rate  systems  which  had  been  implemented  in  hardware  at  the  time  of 
the  study,  and,  consequently,  offered  a large  user  acceptability  data 
base  covering  most  classes  of  distortion  present  in  modern  speech 
digitization  algorithms.  The  existence  of  the  PARM  data  base  offered 
a unique  opportunity  to  measure  the  ability  of  objective  measures  to 
predict  true  subjective  acceptability  scores.  Further,  it  allows  the 
development  of  precise  methodologies  for  the  utilizations  of  objective 
measures  in  conjunction  with  subjective  measures  to  p>ossibly  reduce  the 
cost  of  speech  system  quality  testing. 

This  chapter  describes  a two  part  experimental  study  of  the 
relationship  between  a number  of  objective  quality  measures  and  the 
subjective  acceptability  measures  available  from  the  PARM  study.  In 
the  first  part  of  the  study,  controlled  distortions  were  applied  to 
speech  samples  in  order  to  measure  the  resolving  power  of  the  candidate 
objective  measures  on  these  types  of  distortion.  In  the  second  part, 
the  candidate  objective  measures  were  applied  to  speech  samples  from  the 
same  systems  on  which  the  PARM  tests  were  run,  and  the  statistical 
correlation  between  the  measures,  objective  and  subjective,  were  studied. 

This  entire  chapter  consists  of  five  sections.  In  Section  2.2, 
the  choice  of  objective  measures  is  discussed.  In  Section  2.3,  the 
"controlled  distortion"  experiment  is  presented.  In  Section  2.4,  the 
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objective-aubjective  correlation  experiment  is  described.  Section  2.S 
summarizes  the  results  of  this  effort,  and  suggests  directions  for 
future  research. 

2. 2 The  Choice  of  Objective  Measures 
2.2.1  The  Speech  Perception  Process 

Human  speech  perception  is  a complex  process  in  which  distortions 
in  the  acoustic  signal  do  not  map  simply  onto  perceived  quality.  In 
this  section,  several  aspects  of  speech  perception  which  relate  to 
perceived  speech  quality  will  be  discussed,  and  some  general  conclusions 
will  be  drawn. 

First,  it  should  be  noted  that  the  syntactic  structure  of  a 
language  has  many  components  which  impact  speech  perception.  A sentence 
in  a language  may  be  viewed  as  a concatenation  of  phonemes  which  are 
hierarchically  organized  into  syntactic  and  semantic  units  on  a multi- 
tude of  levels.  Phonemes  are  grouped  into  syllables,  syllables  into 
words,  and  words  into  higher  units  (compounds,  noun  phrases,  verb 
phrases,  clauses,  sentences,  etc.)  based  on  the  phrase  structure  of  the 
sentence  (2.27).  Numerous  modern  linquists  are  tryinq  to  develop  a com- 
prehensive grammatical  theory  for  the  generation  of  the  syntactical 
tree  structures  which  represent  the  underlying  sentence  organization. 

The  point  here  is  that  a great  deal  more  information  than  the  identity 
of  the  phonemes  is  being  transmitted  by  the  speech  signal.  Word 
boundaries,  phrase  boundaries,  and  many  other  syntactic  elements  have 
explicit  correlates  in  the  acoustics.  It  is  these  structural  correlates 
which  allow  the  listener  to  understand  the  sentence  structure,  hence,  to 
use  his  great  knowledge  of  the  language  to  help  him  perceive  the  words 
themselves.  Researches  in  speech  synthesis  by  (2.28)  [2.29]  have  found 
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that  the  need  to  correctly  produce  the  acoustic  correlates  of  the  syntax 
is  at  least  equally  important  to  correctly  producing  the  acoustic 
correlates  of  the  phonemes. 

There  is  yet  another  level  of  information  transmitted  in  the 
speech  signal  above  the  syntactic  level.  This  level  is  semantic  in 
nature,  and  incorporates  the  speaker’s  attitudes  about  the  subject 
matter  of  the  utterance.  Linguistically,  this  information  lies  in  the 
"intonation"  and  "emphasis"  of  the  sentence,  aixi  this  is  also  explicitly 
encoded  in  the  acoustics. 

When  perceiving  a sentence,  a listener  uses  all  these  cues, 
phonemic,  syntactic,  and  semantic,  to  help  him  understand  the  utterance. 
All  these  levels  are  highly  redundant,  and,  in  some  cases,  a great  deal 
of  acoustic  distortion  can  occur  without  effecting  the  intelligibility 
or  even  the  quality  of  the  speech.  However,  in  other  cases,  very 
slight  distortions,  such  as  those  which  effect  the  perception  of  syntac- 
tic structure,  can  cause  complete  loss  of  intelligibility.  What  is 
important  in  understanding  the  effect  of  a particular  distortion  is  in 
understanding  the  way  in  which  it  interacts  with  the  entire  complex 
speech  understanding  process.  At  this  point  in  time,  even  a simple 
complete  enumeration  of  the  information  in  a sentence  is  beyond  the 
scope  of  current  theory.  This  is  why  the  problem  of  developing  general 
objective  quality  measures  is  so  difficult. 

This  is  not  to  say,  however,  that  there  is  not  considerable 
knowledge  about  the  acoustic  correlates  of  the  features  of  speech.  It 
is  well  established  that  the  phonemic  information  is  primarily  found  in 
the  acoustic  filtering  effect  of  the  upper  vocal  tract,  and  hence,  in 
the  short  time  spectral  envelope  of  the  speech.  Likewise,  it  is  well 
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known  that  phase  information,  other  than  pitch,  is  not  perceivable  [2.22] 
Also,  it  has  been  well  demonstrated  that  a great  deal  of  information 
about  consonantal  identities  are  found  in  the  formant  behavior  of  the 
adjacent  vocalics.  But  there  are  other  phonemic  acoustic  correlates  in 
English  besides  the  spectral  envelope.  For  example,  voicing  information 
in  consonants  is  found  in  the  durations  of  adjacent  vowels  and  in  the 
local  pitch  contour  1 2. 30) 

The  major  acoustic  correlates  of  syntactic  structure,  intonation, 
and  emphasis  are  pitch,  vowel  durations,  and  intensity.  Of  these  cor- 
relates, pitch  is  by  far  the  strongest  (2.31)  (2.32] , followed  by  duration, 
and  then  intensity.  There  is  also  evidence  that  there  are  some  effects 
in  the  spectral  envelope  which  are  involved  in  the  perception  of  these 
"supersegmentals , " though  these  are  small. 

When  developing  objective  quality  measures  for  intermediate  rate 
and  low  rate  digitization  systems  an  important  point  is  that,  due  to  the 
nature  of  the  systems  themselves,  only  certain  classes  of  distortions  can 
occur.  For  example,  phoneme  durations,  which  are  very  important  in 
perception  of  both  phonemic  and  structural  information,  are  not  altered 
by  coding.  In  vocoder  systems,  where  the  spectral  envelope,  pitch  and 
excitation,  and  gain  information  are  separated  naturally  as  part  of  the 
digitization  process,  the  mapping  of  the  various  parameters  onto  the 
perceptual  domain  is  relatively  easy  to  characterize.  To  detect 
distortion  related  to  phonemic  perception,  spectral  distance  measures 
seem  most  important.  Since  the  pitch  contour  plays  such  an  important 
role  in  perception,  some  sort  of  excitation  comparison  should  also  be 
used.  Since  gain  is  relatively  less  important,  it  is  expected  that 
only  gross  gain  errors  should  be  detected. 


In  the  case  of  waveform  coders,  tho  distortions  arc  not  ao  caally 
related  to  perception.  Pitch  information  is  not  likely  to  be  effected, 
but  simple  signal/noiso  ratios  are  not  obviously  good  candidates  for 
quality  measures-  A more  likely  candidate  might  bo  a measure  based  on 
the  noise  spectrum  at  the  receiver. 

2.2.2  Specific  Objective  Quality  Measures 

tn  tills  section,  all  of  tho  objective  quality  measures  tested 
in  tills  study  will  be  presented.  All  of  the  measures  studied  wore  not 
necessarily  metrics.  In  order  to  qualify  as  a true  metric,  a distortion 
measuro,  D(X,Y),  hotwoon  two  signals,  X and  Y,  must  meet  the  following 
conditions: 


1. 

U(X,Y) 

* 0 

iff  X-Y 

D(X.Y) 

> 0 

if  Xf< Y 

2. 

P ( X , Y ) 

» 0 ( Y , X ) 

2. 

D ( X , Y ) 

n D(X,Z) 

+ D(Z,Y) 

Some  of  tho  distortion  measures  in  this  study  meet  those  requirements, 
while  others  do  not. 

2 . 2 . 2 . I Spectral  Dl stance  Measures 

Spectral  distance,  in  this  context,  refers  to  a distance  measure 
between  a sampled  envelope  of  the  source  or  unprocessed  speech  signal 
and  a degraded  form  of  the  signal.  Since  there  are  many  methods  for 
approximating  the  "short  time  spectrum"  of  a signal  , there  are  corres-r 
pondingiy  many  metrics  which  may  be  formed  from  a speech  signal.  A 
good  moasuro  should  have  two  characteristics:  it  should  consistently 
reflect  perceptually  significant  distortions  of  different  types;  and, 
it  should  bo  highly  correlated  with  subjective  quality  results. 

A total  of  sixteen  spectral  distance  measures  and  related 
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measures  were  studied  in  this  project.  Let  V(9),  be  the  short 

time  power  spectral  envelope  for  a frame  of  the  original  sentence  and 
let  V'(9)  be  the  power  spectral  envelope  for  the  corresponding  frame  of 
distorted  sentence.  In  this  discussion,  it  is  assumed  that  the  proper 
time  synchronization  has  occured,  and  that  V(0)  and  V'(9)  are  for  the 
same  frame  of  speech.  Due  to  the  fact  the  gain  variations  are  not  of 
interest  here,  the  spectrum*  V(fl)  apd  V (0)  may  be  normalized  to  have 
the  same  arithmetic  moan  either  in  a linear  or  a log  form.  A geometric 
distance  between  the  spectrums  of  the  distorted  and  original  spectrums 
may  be  taken  in  several  ways,  including  direct  spectral  distance 

D (0 ) - V(0)  - V'  (6)  , 2.1 

the  difference  in  the  Log  spectrums 

D(0)  - 10  log10V(9)  - 10  log10V’(C)  , 
the  source  normalized  distance  measure, 

D(0)  - (V(0)  - V (9) ]/V<9> 

and  the  ratio  of  power  spectrums 

D (0 ) - V(0)/V' (0)  . 2.4 

Of  these  measures,  2.1  and  2.2  can  form  th  basis  for  true  metrics, 
while  2.3  and  2.4  cannot.  A large  class  of  distance  measures  can  bo 
defined  as  the  weighted  L norm  "d  " by 
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where  W(V,V',6)  is  a weighting  function  which  allows  functional  weight- 
ing based  on  cither  of  the  power  spectral  envelopes  or  on  frequency . In 
this  study,  W(V,V',0)  - 1,  and  2.5  reduces  to 
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Clearly,  the  highur  the  value  of  “p,"  the  greater  the  emphasis  on  large 
spectral  distances.  This  measure  may  be  digitally  approximated  by 
sampling  D(0),  giving 
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2. 2. 2. 1.1  The  LPC  Spectral  Distance  Measures 

Since  tho  output  spcoch  waveform  is  a convolution  between  a 
spoctral  envelope  "filter"  and  excitation  signal,  then  a deconvolution 
is  necessary  for  spoctral  envelope  comparisons.  Tho  LPC  analysis  is 
itself  a parametric  spoctral  estimation  procosa,  and  may  bo  used  to 
extract  an  approximation  of  the  spectral  onvolope.  The  block  diagram 
for  an  LPC  spectral  analysis  system  is  given  in  Figure  2.1.  If  tho 
LPC  parameters  are  (a^,  . . . ,a^)f  then  the  spectrum  function  V(0), 
is  given  by 


V <0  > 
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WINDOW 


FIGURE  2.1  SYSTEM  TO  00  IRC  SPECTRUM  ANALYSIS 


where 


A(z) 


N 

- I 

i=l 
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-i 
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This  approximation  can  be  used  to  calculate  any  of  t^e  measures  suggested 
above. 

There  are  a number  of  additional  measures  which  can  be  calculated 

from  A(z)  . These  are  not  true  spectral  distance  metrics  or  measures, 

but  are  related,  and  have  the  additional  feature  that  they  are  easy  to 

calculate.  Several  of  these  measures  are  simply  geometric  distances  in 

the  parameter  domains,  such  as  feedback  coefficients,  PARCOR  coefficients, 

area  functions,  and  pole  locations.  In  each  of  those  cases,  we  can 

define  d as 
P 


dp ( t , C ' ) 


N 

l um  - 

i rn 
m=l 
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where  £ is  the  m^  parameter  (PARCOR  coefficient,  area  function,  etc.), 
m 

and  N is  the  number  of  parameters  involved  in  the  representation. 

Another  related  approach  is  illustrated  in  Figure  2.2.  The 
original  speech  signal  is  analyzed  using  an  LPC  analysis,  and  the 
inverse  filtered  waveform  is  formed  by 
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where  is  the  jt^1  LPC  coefficient  and  is  the  speech  sample. 

This  optimal  filter  is  then  used  to  inverse  filter  the  distorted 
waveform,  resulting  in 
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The  measure  which  is  used  is  then 
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where  L is  the  total  number  of  samples  in  the  utterance. 

2. 2. 2.1. 2 Cepstral  Spectral  Distance  Measures 

Another  technique  used  often  for  deconvolving  the  spectral 
envelope  from  the  excitation  is  cepstral  analysis  12.33)12.34].  The 
analysis  system  for  cepstral  analysis  is  shown  in  Figure  2.3.  By 
Parseval's  Theorem,  d2  can  be  calculated  from  the  cepstrum  by 


GO 


where  and  are  the  cepstral  components  for  the  original  and  the 
test  signal  respectively.  For  the  same  reason  that  cepstral  deconvolu- 
tion works  well  on  speech,  only  a few  coefficients  need  to  be  used 
(<  40)  to  calculate  d2>  Since  the  cepstral  measure  is  computationally 
intensive  (2  FFT's  per  frame)  and  since  it  has  boon  shown  that  d2 
calculated  from  A(z)  is  very  highly  correlated  with  d2  calculated  from 
the  cepstrum  (2.35),  then  it  does  not  appear  that  the  cepstral  measure  is 

very  attractive.  However,  the  cepstral  measure  is  attractive  for 
excitation  feature  extraction  (see  2. 2. 2. 2. 2);  since  the  low  order 
cepstral  coefficients  are  a by-product  of  that  analysis,  and  since  cCP's 
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ORIGINAL 

SPEECH 


FIGURE  2.3  SYSTEM  FOR  PERFORMING  CEPSTRAL  DECONVOLUTION 
FOR  SPECTRAL  ENVELOPE  ANALYSIS. 
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offer  potential  for  cheap  FFT's  using  the  CH1RP-Z  Transform,  then 
cepstral  measures  are  worthy  of  consideration. 

2 . 2 . 2 . 2 Excitation  Feature  Extraction 

Pitch  is  a very  important  acoustic  correlate  of  many  supersegmen- 
tal  features,  and  distortions  in  the  pitch  contour  are  easily  perceivable 
and  very  detrimental  to  quality.  Pitch  estimation  errors  and  voiced/ 
unvoiced  errors  may  occur  in  any  pitch  excited  vocoder  system.  Hence, 
it  is  of  interest  to  investigate  objective  measures  for  comparing 
excitation  features  for  those  systems  where  it  is  applicable. 

The  ideal  solution  to  this  problem  would  be  to  generate  high 
quality  pitch  contours  for  the  original  utterances,  and  to  compare 
these  to  the  values  used  by  the  vocoder  synthesis  algorithm.  However, 
since  the  excitation  parameters  are  not  explicitly  available  in  vocoder 
systems,  and  since  the  excitation  data  is  not  available  for  the  systems 
used  in  the  PARM  test,  then  this  approach  is  unreasonable. 

A second  possibility  is  to  apply  a high  quality  pitch  detector 
to  both  the  original  and  the  distorted  speech,  and  to  compare  these 
results.  A system  which  compares  pitch  excitation  contours  was  developed 
at  Georgia  Tech  under  a previous  effort  [2.36]  along  with  several  high 
quality  pitch  detection  programs.  The  statistics  per*(  rmed  by  the 
pitch  comparison  program  (PCHECK)  are  enumerated  in  .’e  le  2.1.  This 
approach  was  studied  experimentally  using  the  Hard  Limited  Autocorrela- 
tion Pitch  Detector  (2.36)  and  the  Multiband  pitch  Detector  [2.36). 

A third  possible  approach  involves  developing  a measure  for 
excitation  differences  which  does  not  depend  on  any  pitch  detection 
algorithm.  The  idea  is  to  use  a deconvolution  technique  which  is  ajmed 
at  retrieving  the  excitation  representation  rather  than  the  spectral 
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STATISTICS 


1.  Total  number  of  pitch  errors 

2.  The  average  errors  per  sample  in  voiced  regions 

3.  The  number  of  gross  errors  (qreater  than  a threshold) 

4.  The  average  gross  errors 

5.  The  number  of  subtle  errors  (less  than  a threshold) 

6.  The  average  subtle  errors 

7.  Thu  number  of  voicing  errors 

8.  Sample  standard  deviations  from  the  above  averages 

2.1  Statistics  Calculated  by  "PCHECK"  Pitch  Comparison  Program 
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envelope  representation.  The  cepstrums  of  the  two  speech  signals  have 
many  features  which  suggest  that  they  might  be  good  candidates  for  an 
excitation  distance  measure.  First,  they  have  a region  in  which  the 
signal  characteristics  are  almost  entirely  representative  of  the  excita- 
tion function.  Second,  since  this  region  is  easily  identifiable,  no 
pitch  decision  or  voiced/unvoiced  decision  is  necessary.  Third,  the 
shape  of  the  cepstrum  in  the  excitation  region  contains  some  additional 
information  about  the  excitation  besides  just  pitch.  Last,  the  compu- 
tation of  the  cepstrum  leads  to  a spectral  envelope  representation  which 
might  also  be  used  as  part  of  a spectral  distance  measure. 

The  way  in  which  an  excitation  distance  measure  might  be  calcu- 
lated is  illustrated  in  Figure  2.4.  After  the  cepstrum  of  the  two 
signals  is  calculated,  a smoothing  filter  is  used  to  make  the  measure 
less  severe.  Next,  a distance  metric  is  calculated  by 
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In  this  measure,  and  are  the  cepstral  coefficients  for  the  original 
and  distorted  speech  respectively,  and  W(C,C',k)  is  a weighting  function. 
In  this  study,  the  weighting  functions  which  were  studied  were  W(C,C(,k) 
»1  (no  weight)  and  W(C,Cr ,k)=*ck,  which  weights  samples  near  pitch  peaks 
more  than  those  in  unvoiced  regions. 

2.2. 2. 3 Noise  Power  Measures 

Traditionally,  signal-to-noi9e  ratio  has  been  one  of  the  pre- 
dominant measures  for  determining  the  performance  of  waveform  coding 


systems.  This  measure  is  attractive  since  it  is  so  easily  calculated 
and  since  values  for  this  measure  are  known  for  most  appropriate 
systems.  It  is  unattractive  since  it  is  difficult  to  evaluate  in 
light  of  what  is  known  about  speech  perception. 

A far  more  interesting  approach  might  be  to  develop  a measure 
based  on  the  coloration  of  the  noise  as  well  as  its  power.  In  short, 
if  noise  is  defined  as 

ni  = Si  " si  ' 2.16 

where  and  s!^  are  samples  of  the  original  and  distorted  speech 
respectively,  then  the  noise  spectral  envelope  N(0)  could  be  found 
using  LPC  or  cepstral  techniques  as  before.  A measure  could  be 
defined  such  that 


n 

P 


+1T 

wojN^tejde 

— TT 


+ 71 

W (0 ) d0 

— TT 


and 


d 

P 


l sp 

i=l  n 


1/P 


2.17 


2.18 


This  would  be  attractive  since  it  would  allow  ome  measure  of  the 
spectral  characteristics  of  the  noise,  which  is  very  likely  to  have 
perceptual  impact.  If  W(0)=1,  then,  by  Parseval's  Theorem,  this  measure 
becomes  the  signal- to-noise  ratio  for  p*2. 

Though  this  represented  a very  interesting  area  for  study,  very 
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little  was  done  on  noise  measurements  in  this  study.  This  is  because 
the  data  base  associated  with  the  FARM  was  not  in  a form  to  make  the 
necessary  computations  reasonable. 

2 . 3 Initial  Qualitative  Studies  and  Controlled  Distortions 

This  section  describes  two  phases  of  the  experimental  study.  In 
the  first  phase,  example  sentences  from  various  systems  were  digitized 
from  analog  magnetic  tape,  and  various  forms  of  gain  measures  and 
spectral  measures  were  applied  and  studied  • In  the  second  phase,  the 
measures  presented  in  the  previous  section  (2.2)  were  applied  tc 
sentences  which  contained  controlled  distortions  to  test  these  measures 
for  consistency  in  measuring  these  distortions,  to  check  the  measure- 
ment of  combined  distortions,  and,  by  using  the  histograms  of  time  be- 
havior of  the  various  measures,  to  determine  a potential  resolving 
power  for  each  measure. 

2.3.1  Qualitative  Studies 

In  the  initial  study,  a total  of  20  sentences  from  two  speakers 
and  five  systems  were  digitized  from  analog  tape  (digital  tape  repre- 
sentations wore  not  available  at  that  time) , and  stored  on  disk.  (See 
Table  2.2.)  A subgroup  of  those  sentences  was  then  analyzed  for  energy 
contours  and  for  spectral  representations  and  cepstral  spectral  analysis. 

The  energy  wan  measured  by  applying  Kaiser  windows  [2.37]  of 
various  lengths  as  FIR  filters  to  the  squared  waveforms.  The  window 
lengths  were  adjusted  sucli  that  pitch  periods  were  not  obvious  in  the 
energy  representations.  These  energy  plots  were  then  used  to  try  to 
synchronize  the  sentences  with  one  another. 

Several  results  came  out  of  this  study.  First,  not  unexpectedly , 
the  energy  plots  for  the  waveform  coders  (CVSD  16  and  CVSD  9.6)  were 
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TEST  UTTERANCES 


HI  ANCHOR 

LL1* 

LL2 

CHI 

CH2 

CVSD  (16  KBPS) 

LL1* 

LL2 

CHI 

CH2 

CVS!)  (9.6  KBPS) 

LI.l  * 

LI.2 

CHI 

CH2 

LONGBRAKE  (2.4  KBPS) 

LL1* 

LL2 

CHI 

CH2 

HY2  (2.4  KBPS) 

LL1* 

LL2 

CHI 

CH2 

Part  of  Subtost.  Croup 


Table  2.2  Input  Sentences  Used  in  the 
Initial  Qualitative  Studios 
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very  similar  to  th.it  of  t ho  hiqh  anchor  (oriqlnnl).  S»>eondr  the  enemy 
plots’,  for  the  vocoders  (honqhrnko  .1.4  nml  Hy2  2.4)  wore  very  different 
from  the  hiqh  anchor  and  very  different  from  each  other.  Attempts*  to 
siynehronizo  1 1 »«'  utterances*  usinq  the  qatn  waveforms*  result  in  different 
ssynchroni nations*  than  if  t.lie  waveforms*  are  synehroni r.od  visually.  The 
point  her*'  is  that, .since  the  local  intensity  of  a speech  waveform  is; 
not  a hiqhly  perceivable  quantity,  and  vocoders  take  advantage  of  this* 
by  doimj  relatively  pi'or  qain  estimation,  and  points*  out  that  enemy  is* 
probably  not  a qood  candidate  for  an  objective  qualify  measure. 

Another  i>oint  should  bo  made  tune.  The  synchronization  of forts 
hero  point  tip  clearly  that  the  use  of  analoq  mantlet  ic  tape  for  recent  in* 
utterances  is  qenerally  unacceptable.  Kffects*  which  (we  presume)  are 
duo  to  the  stretch* nq  of  the  analoq  tapes  prevented  synehroni  r.at  ion  from 
beinq  maintained  for  mote  than  1-2  ssoconds.  Carefully  :*ynehroni zed 
diqit  al  playback  and  recordinq  :;y:*t  oms*  must  be  used  a«s  a tvtsis  for 
reasonable  objective  moasniros. 

In  tin'  ssecond  part  of  this  study,  10  pole  1 .ft * "spectral  analy.sis* 
and  40  coefficient  cops*t  ral  specti.il  analysis  was*  performed  on  t (*«'  f i vo 
test  sentence:;,  and  t-P  perspective  plots*  were  produced.  Thosso  plots* 
are  shown  in  Piquies  1 4 . Several  points*  were  (observed  from  those 

plots*.  first  , t lie  peaks  in  t tic’  I .IV  .spectra  were  qenerally  sharper 
than  those  ot  the  cepstral  spectra.  Second,  however,  t tie  eepstral 
speet  t a , on  the  whole,  bad  much  more  local  variations  than  the  l.IV 
spectra.  Third,  the  spectral  variations  eausod  by  the  waveform  coders 
were  more  noticalde  in  the  l .IV  ea.se  than  in  the  .-epstral  case’.  On  the 
whole,  no  el  o.is  .ulvaiit  .i>n'  tor  either  of  t ho  two  analyses*  eoul.l  be  found 
t rom  t hose  plot s . 


2h 


3ost  Available  Copy 


\ \ \\ V\\ \\ \ • * \ » \\  V WW  • Y V.  \ 

■,  \ w \ \\ \ \ ''  \V -Av\  \ \ W \X  V \ 
. s'XW^.nWv, V 

VC\\\\;vN>\>NX'NvX\V\\^^  \Y\\ 

\ \ \ \ \ v \ \ W\\  \ \ \ > \\\  \ V v,  \ f\\  v \ \ 

--N 

m\\m\V  w 

\ \ \\  v \ \\\ v \\\ \ \\ 

\ V \ \\  x \\  \ \ \\\  \\\\  v\\  \ \\  \V  \\x\ 

\mV  <:  \ \\w\m^ 


\ V \ \ \ \Y ‘ ' <..  \ \ V\ V\V\V\\\^ 

\\\Yk  ■>,  x\\\mw 

\\\\\\vtk  vx\  \\M\\\v 


\v  WVWW 

aY\\  \ xVx  Na\  \ vx  \\aX  n 

W\\V^NVww 

\\V’.\\vAh  \\  V v.\  0 


FIGURE  2.6  LPC  SPECTROGRAM  OF  CVSD  AT  9.6  KBPS  (LL1) 
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FIGURE  2.9  LPC  SPECTROGRAM  OF  L0NG8RAKE  AT  2.4  KBPS  (LL1) 


FIGURE  2.11  CEPSTRAl  SPECTROGRAM  OF  CVSD  AT  9.6  KBPS  (LL1) 
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FIGURE  2.12  CEPSTRAL  SPECTROGRAM  OF  CVSD  AT  16  KBPS  (LL1) 


FIGURE  2.13  CEPSTRAL  SPECTROGRAM 


FIGURE  2.14  CEPSTRAL  SPECTROGRAM  OF  LONGBftAKE  AT  2.4  KBPS  ( LL1 ) 
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2.3.2  The  Controlled  Distortion  Experiment 

The  purpose  of  the  controlled  distortion  experiments  was  to  test 
the  candidate  measures  discussed  in  Section  2.2  as  to  their  resolving 
power  for  measuring  certain  classes  of  distortions.  In  all  cases , the 
"original"  was  taken  to  be  the  output  of  a 12  tap  LPC  synthesis  program 
where  the  coefficients  were  unquantized  and  the  pitch  was  extracted  by 
hand.  Two  sets  of  signals  were  used.  One  set  consisted  of  four 
synthetic  vowels  (/i/#^p/,/u/  and  /A/) , the  other  of  two  sentences,  one 
spoken  by  a male  speaker  and  one  spoken  by  a female  speaker.  In  all 
cases,  five  classes  of  distortions  were  applied:  bandwidth  distortion; 
frequency  distortion;  pitch  distortion;  low  pass  filtering  distortion; 
and  additive  noise. 

2. 3. 2.1  Bandwidth  Distortion 

Distortions  in  the  bandwidth  of  formants  is  a common  occurrence 
in  vocoders.  To  test  this  type  of  distortion,  the  unit  circle  was 
effectively  expanded  by  transforming  each  LPC  coefficient  by 

ai  ai  • 2. 19 

In  this  experiment,  the  four  values  of  a which  were  used  were  .99,  .98, 
.97,  and  .95.  The  first  two  values  introduced  no  perceivable  distortion. 

2. 3. 2. 2 Frequency  Distortion 

The  frequency  distortion  was  carried  out  by  up  or  down  sampling 
the  impulse  response  of  the  LPC  synthesizer.  Figure  2.15  shows  the 
procedure.  First,  a FIR  (256  point)  approximation  for  the  HR  impulse 
response  was  calculated.  Then  a zero  padded  interpolation  was  performed 
using  a j.000  point  Kaiser  window  designed  linear  phase  low  pass  filter. 
The  resulting  modified  impulse  response  was  used  to  synthesize  the 
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speech  samples.  Sampling  ratios  of  49-50,  50-49,  9-10,  and  10-9  were 
used. 

2. 3. 2. 3 Pitch  Distortion 

Pitch  distortion  was  applied  by  allowing  the  pitch  period  to 
systematically  increase  over  the  voiced  regions.  This  results  in  pitch 
distortions  which  increased  with  time  in  each  utterance.  The  rates  at 
which  the  periods  were  allowed  to  vary  was  +1  sample  every  10  voiced 
frames,  +1  sample  every  4 voiced  frames,  -1  sanple  every  10  voiced 
frames,  and  -1  sample  every  4 voiced  frames. 

2. 3. 2. 4 Low  Pass  Filter  Distortion 

Bandlimiting  distortions  are  very  common  in  speech  communication 
systems,  and  hence  worthy  of  study.  The  filters  used  were  all  10t*1 
order  recursive  digital  eliptical  filters  with  rejection  bands  at  -60  DB. 
In  all,  four  filters  were  used  with  cutoffs  at  1.4  kHz,  1.8  kHz,  2.2  kHz, 
and  2.8  kHz. 

2. 3.2. 5 Additive  White  Noise  Distortion 

White  Gaussian  noise  was  also  added  to  the  test  signals.  Four 
noise  levels  were  used  which  resulted  in  signal  to  noise  ratios  of 
~ 13  db,  ~ 10  db,  — 7 db,  and  ~ 3 db. 

2.3.3  The  Experimental  Results 

In  all,  six  utterances,  four  vowels  .768  seconds  in  length  and 
two  sentences  3.072  seconds  in  length,  were  used  as  originals.  A total 
of  four  distortions  for  each  of  the  five  classes  were  applied  to  the  six 
speech  samples,  giving  120  distorted  samples.  The  purpose  of  the  vowel 
distortion  study  was  to  measure  the  effects  of  each  measure  in  a "micro" 
sense  in  order  to  compare  resolving  powers  of  the  different  measures. 

The  purpose  of  the  full  sentence  distortions  was  to  measure  the  "macro" 
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behavior  of  each  objective  measure.  In  all  cases,  the  total  sentence 


metric  was  calculated  from 
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In  this  expression,  D is  the  total  distortion  for  the  entire  sentence 

set,  W (m)  is  a weighting  function,  d is  the  "d  " measures  defined 

p,m  p 

in  Section  2.2.2  at  the  m frame  of  the  analysis,  and  M is  the  tuL.il 
number  of  analysis  frames.  W*  (r‘  was  taken  to  be 


W (m)  - 1 , 


2.20 


and 


W (m)  - ti  , 
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whore  G is  the  l, PC  gain  of  the  original  sentence  in  the  m frame.  The 
m 

LPC  analyses  were  always  done  with  a Hamming  windowed,  autocorrelation 
LPC  with  a frame  interval  of  250  samples  and  a window  width  of  250 
samples.  The  gain  weighting  here  wrs  included  to  see  how  the  overall 
outcome  would  be  effected  an  a ma'-tor  of  academic  interest.  The 
hyj>othosin  in  that,  since  the  vocalics  contain  a large  tx^rtion  of  the 
Information,  and  Hince  the  gain  is  always  greeter  for  vocal ies,  then  a 
gain  weighted  moasuro  might  be  more  highly  correlated  with  perceptual 
results.  This  experiment,  clearly,  gives  ro  now  informat  ion  on  this 
hypothesis , but  It  does  show  to  what  extent  gain  weighting  change;-  ,’ie 
final  objective  quality  estimate. 

In  all  canes,  t)  was  taken  to  bo  the  sum  of  M independent  random 
P 
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L. 


variables,  nil  with  the  same  standard  deviation.  The  sample  variance 
was  calculated  from 


(d  -D  ) 
p,m  p 

M-l 


2.22 


Tin*  random  variable 


t 


D -D 


_12 £ 


2.23 


is  t distributed  (nee  Chapter  3)  with  eero  mean  and  unit  variance. 

A confidence  interval  for  0 , the  true  mean  for  D , for  a significance 

P P 

level  a (a  - .01  and  . OS ) can  bo  calculated  from 


u 

V 


ll  o 
aM  n 


P v l) 

p P 


lnM°D 


2.24 


where  l>  and  U .are  the  lower  and  upper  significance  limits  for  a t 
distributed  random  variable  (u  - 0,  e - 1)  for  M i**lnts  and  probability 


a . 

2 . 3 . 3 . I Results  of  t he  Vowe l Tost  s 

The  results  of  the  vowel  tests  for  frequency  distortion  and 
bandwidth  distortion  are  compiled  in  Table  2.3,  the  results  for  low 
pass  filtering  distortion  and  noise  distortion  are  given  in  Table  2.4, 
and  the  results  for  pitch  distortion  are  given  in  Table  2.S. 

Several  points  should  be  made  aliout  these  results.  First,  all 
of  t!»»'  tests  seem  to  perform  relatively  well  on  the  two  frequency 
distortions,  with  all  test;!  able  to  resolve  the  distortions  at  least 

Best  Available  Copy 
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SPECIAL 

DISTORTION 

MEASURES 


\NDWIDTH  DISTORTIONS 


LOG  LPC 

AV. 

.076 

C.I. 

.03 

LOG  LPC 

AV. 

.081 

C.I. 

.03 

LOG  LPC 

AV. 

.12 

C.I. 

.05 

LINEAR  LPC 

AV. 

1280 

C.I. 

825 

CEPSTR'JM 

AV. 

.088 

C.I. 

.03 

PARCOR 

AV. 

1.1 

C.I. 

.06 

FEEDBACK 

AV. 

113 

C.I. 

61 

AREA 

AV. 

1.1 

C.I. 

0.2 

POLE  LOCATION 

AV. 

2.3 

C.I. 

.93 

FREQUENCY  SHIFT  DISTORTIONS 
SHIFT  RATIOS 


50/49 

49/50 

10/9 

9/10 

.08 

.07 

.91 

.83 

.03 

.03 

.11 

.10 

.11 

.10 

1.2 

.90 

.04 

.02 

.12 

.10 

.13 

.15 

1.6 

1.3 

.05 

.05 

.14 

.12 

2041 

2112 

4510 

4910 

.14 

.12 

.03 

.03 

1.5 

1.3 

.04 

.02 

104 

127 

55 

67 

1.4 

1.2 

. 31 

.32 

2.1 

1.9 

.91 

.80 

2013  2412 


AV.  ■ Average 


C.I.  * Confidence  Interval  (.05  Level) 


Table  2.3  Results  of  the  Bandwidth  Distortions 
and  Frequency  Shift  Distortions  on 
Vowels.  All  Confidence  Intervals  are 
at  the  .05  Level. 
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SPECIAL 

DISTORTION 

MEASURES 

Dx  LOO  LPC 
D2  LOG  LPC 

D . LOO  LPC 

4 

D2  LINEAR  LPC 
d2  CEPSTRUM 

D2  parcor 
D2  feedback 
D2  area 

D2  POLE  LOCATION 


BANDLIMIT 

DISTORTION 

NOISE  DISTORTION 

BANDLIMIT 

S/N 

2.8 

2.2 

1.8 

1.4 

13 

10 

7 

3 

AV. 

7.3 

12.1 

14.6 

16.2 

1.7 

2.8 

5.0 

7.8 

C.I. 

1.1 

2.4 

2.8 

3.5 

.22 

.62 

.97 

1.81 

AV. 

8.1 

13.3 

15.6 

17.5 

1.9 

3.2 

5.2 

8.6 

C.I. 

1.2 

2.3 

3.1 

3.6 

. 31 

.82 

1.4 

2.6 

AV. 

9.4 

14.4 

16.7 

18.2 

2.4 

3.6 

5.6 

10.1 

C.I. 

1.4 

2.5 

3.5 

3.7 

.40 

1.02 

1.05 

1.19 

AV. 

6851 

7175 

8281 

9143 

5431 

5941 

6643 

7141 

C.I. 

855 

991 

1097 

1211 

2413 

2712 

3143 

4127 

AV. 

8.8 

14.1 

16.0 

18.1 

1.6 

3.1 

5.2 

8.8 

C.I. 

1.3 

2.2 

3.3 

3.6 

.33 

.91 

1.3 

2.7 

AV. 

5.2 

5.5 

5.9 

6. 3 

3.1 

3.6 

4.3 

4.6 

C.I. 

1.1 

1.3 

1.2 

1.6 

.81 

.80 

.93 

.92 

AV. 

827 

955 

1010 

1210 

621 

751 

827 

921 

C.I. 

310 

341 

381 

425 

125 

281 

317 

397 

AV. 

5.3 

5.9 

6.6 

6.9 

2.8 

2.9 

3.1 

3.3 

C.I. 

. 34 

.41 

.55 

.57 

.21 

.35 

.44 

.89 

AV. 

6.6 

6.7 

6.7 

6.9 

4.1 

4.4 

4.9 

5.2 

C.I. 

3.4 

3.3 

3.3 

3.6 

2.2 

2.1 

2.7 

2 . 6 

AV.  = Average  C.I.  = Confidence  Interval  (.05  Level) 


Table  2.4  Results  of  the  Bandlimit  Distortion  and  Additive 
Noise  Distortion  on  Vowels.  All  Confidence 
Intervals  Are  at  the  .05  Level. 
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the  .05  level.  This  point  is  also  illustrated  in  Figures  2.15  and  2.16, 
which  show  the  time  behavior  of  the  d^  log  LPC  measure  for  the  frequency 
and  bandwidth  distortion.  As  judged  by  their  confidence  intervals, 
the  log  LPC  measures  are  the  best,  while  the  pole  position  and  feedback 
coefficients  are  the  worst  for  those  two  frequency  distortions.  Second, 
note  that,  for  low  pass  filter  distortion  (Table  2.4),  the  results  are 
qualitatively  the  same  as  those  above.  But  also  note  that  quantitatively 
they  are  very  different,  giving  much  greater  spectral  distances  than  the 
bandwidth  and  frequency  shift  distortions.  This  can  also  be  seen  in 
Figure  2.17.  This  brings  up  an  important,  if  obvious,  point. 

That  is  that  low  pass  filtering  distortion  swamps  the  more  subtle  forms 
of  frequency  distortion.  Hence,  some  bandwidth  decision  and  control 
is  necessary  in  these  objective  tests  if  the  more  subtle  distortions  are 
to  be  measured. 

The  noise  results  show  some  resolving  power  for  the  various  noise 
levels,  but  a general  loss  of  resolution  when  compared  to  the  frequency 
and  bandwidth  results.  Stated  simply,  this  type  of  distortion  is  not 
measured  well  by  spectral  distance  measures,  and  hence  requires  a large 
sample  of  speech  to  detect  it  properly. 

The  results  of  the  pitch  variation  studies  presented  in  Tabic 
2.5  show  that  essentially  no  spectral  distance  measure  can  detect  pitch 
errors  with  the  number  of  samples  used  in  this  experiment.  This,  of 
course,  was  an  expected  result,  and  was  the  reason  that  the  special 
pitch  tests  were  included. 

The  cepstral  pitch  measure  described  in  Section  2. 2. 2- 2 was 
applied  to  the  four  pitch  distortions  using  each  of  the  four  smoothing 
window  functions  shown  in  Figure  2.17. 
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d2- BANDWIDTH  DISTORTION  FACTOR  = .99 

I.Odb! 


0.0 


d2  - SANDWIOTH  DISTORTION  FACTOR  = .98 

I.Odb  | 

I 


TIME 


d2- BANDWIDTH  DISTORTION  FACTOR  = .97 


I.Odb  | 

I 


TIME 


d2  - BANDWIDTH  DISTORTION  FACTOR  3 .95 


I.Odb 


\ 


0.0 


TIME 


FIGURE  2.15  PLOTS  OF  d2  LOG  LPC  SPECTRAL  DISTANCE  MEASURES  FOR  THE 
SYNTHETIC  VOWEL  FOR  VARIOUS  BANDWIDTH  DISTORTION 
FACTORS.  THE  DISTORTION  IS  FORMED  FROM  ly-a1.-!  WHERE  a 
IS  THE  BANDWIDTH  DISTORTION  FACTOR. 
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DISTORTION  FACTOR  9/10 


A 8 


FiGURE  2.16  PLOTS  OF  ^ LPC  SPECTRAL  DISTANCE  MEASURES  FOR  THE  SYNTHETIC  VOWEL  i3e 
FOR  VARIOUS  FREQUENCY  SHIFT  DISTORTION  RATIOS. 


» ■ ■ t ■■■ — ■ » — -c=« 

”1 1 1 1 1 

1.  2.  3.  4.  S. 

SAMPLE  POINT 


FIGURE  2.17(a)  PLOTS  OF  THE  d2  LOG  LPC  SPECTRAL  DISTANCE  MEASURE  ON 
VOWELS  FOR  THE  VARIOUS  DISTORTIONS  USED  IN  THIS  STUDY. 
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SPECTRUM  ERROR 


FIGURE  2.17(b)  PLOTS  OF  THE  d2  LOG  LPC  SPECTRAL  DISTANCE  MEASURES  ON 

SENTENCES  FOR  THE  VARIOUS  DISTORTIONS  USED  IN  THIS  STUDY. 
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*-■  irmppf  ^pf*npi>  u»  qn  ■ 


SPECTRAL 

DISTORTION 

MEASURES 


PITCH  DISTORTION 


Dj  LOG  LPC  AV. 

C.  I . 


D LOG  LPC  AV. 

C.  I. 


D LOG  LPC  AV. 

4 


D2  LINEAR  LPC  AV . 

C.I. 


D CEPSTRUM  AV. 

C.  I . 


D PARCOR  AV  . 

C.I. 


D FEEDBACK  AV . 

C.I. 


D AREA  AV. 

C.I. 


D2  POLE  LOCATION  AV. 

C.I. 


AV.  = Average 

C.I.  = Confidence  Interval  (.05  Level) 


10,1 

10,-1 

4,1 

4,-1 

.071 

.064 

.073 

.072 

.03 

.03 

.04 

.03 

.079 

.081 

.076 

.078 

.03 

.03 

.03 

.03 

.09 

.092 

.084 

.092 

.04 

.05 

.04 

.04 

821 

871 

888 

841 

640 

510 

530 

511 

.82 

.86 

.84 

.81 

.03 

.03 

.04 

.03 

.91 

.84 

.88 

.86 

.06 

.05 

.06 

.05 

87 

88 

83 

89 

48 

51 

55 

46 

.91 

.96 

.81 

.86 

.21 

. 23 

.20 

.19 

2.1 

2.0 

2.2 

2.3 

1.10 

1.02 

1.05 

.98 

Table  2.5  Results  of  the  Pitch  Distortions  on  Vowels. 

Note  that  the  Distortions  are  Low,  and  In- 
crease Distortions  Cause  No  Increase  in  the 
Measures. 


Since  this  was  a time  varying  distortion,  then  the  statistical 
analysis  used  in  the  spectral  distance  tests  is  inappropriate.  Figures 
2.18-2.21  show  the  results  for  the  four  windows.  The  basic  result 
here  is  that  this  measure  forms  a high  resolution  measure  of  pitch 
errors.  For  short  windows,  the  measure  detects  very  small  errors,  but 
saturates  quickly,  hence  reporting  the  same  result  for  all  errors. 

Longer  windows  do  a better  qualification  of  the  pitch  errors,  but  do  not 
pick  up  small  errors  well.  Probably,  since  most  of  the  computation  in 
this  measure  is  in  the  cepstrum  calculation,  a reasonable  solution 
would  be  to  use  several  windows* to  better  quantify  the  results. 

7 . 3.3.2  Results  of  the  Sentence  Tests 

The  results  of  the  sentence  tests  are  tabulated  in  Table  2.6, 

2.7,  and  2.8.  Qualitatively,  these  results  pretty  well  mirror  the 
results  of  the  vowel  tests.  Quantitatively,  however,  the  confidence 
intervals  are  uniformly  larger.  The  genera1,  result  here,  therefore,  is 
that  larger  sample  sizes  are  necessary  when  dealing  with  real  sentences. 

An  important  result  of  the  sentence  tests  can  be  seen  from  a 
comparison  of  the  gain  weighted  measures  to  the  non  gain  weighted 
measures,  as  shown  in  Table  2.9.  In  nearly  every  case,  the  gain 
weighting  causes  the  measure  to  decrease.  This  means  the  measure  is 
being  inflated  by  the  low  power  unvoiced  regions  which  are  perceptually 
less  important  than  the  high  vocalic  regions.  This  means  that  gain 
weighting  probably  will  give  better  subjective  correlation. 

2 . 4 The  FARM  Correlation  Study 

As  was  stated  in  the  introduction,  the  PARK  subjective  quality 
data  base  offers  a good  chance  to  study  t!.._  correlation  between  the 
objective  measures  under  consideration  and  the  isometric  subjective 
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TIME 


I 


FIGURE  218  CEPSTRAl  PITCH  ME'  <)IC  AS  A FUNCTION  OF  TIME  FOR  FOUR 
DIFFERENT  PITCH  DISTORTIONS  FOR  WINDOW  NO.  1 (FIGURE 
2.3).  WINDOW  LENGTH  « 1. 

SO 


prr*  : . • - . 


3.0 


0.0! 


3.0 


PITCH  VARIATION  10,1 


TIME 


PITCH  VARIATION  4.1 


TIME 


PITCH  VARIATION  10,-1 


3.0 


i I 


0.0 


208 


TIME 


PITCH  VARIATION  4,-1 


i 


TIME 


3 

1 


FIGURE  2.19  CEPSTRAl  PITCH  METRIC  AS  A FUNCTION  OF  TIME  FOR  FOUR  DIFFERENT 
PITCH  DISTORTIONS  FOR  WINDOW  NO.  2 (FIGURE  2.3)  WINDOW  LENGTH  = 4. 
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PITCH  VARIATION  10,1 


PITCH  VARIATION  10,-1 


3.0 


3.0 


i 


TIME 


TIME 


PITCH  VARIATION  4,1 
3.0 


PITCH  VARIATION  4,-1 

3.0 


TIME 


FIGURE  2.20  CEPSTRAL  PITCH  METRIC  AS  A FUNCTION  OF  TIME  FOR  FOUR  DIFFERENT 
PITCH  DISTORTIONS  FOR  WINDOW  NO.  3 (FIGURE  2 3).  WINDOW  LENGTH  = 10. 


PITCH  VARIATION  4,1  PITCH  VARIATION  4,-1 


TIME  T,ME 


[ 

i ! 

FIGURE  2.21  CEPSTRAL  PITCH  METRIC  AS  A FUNCTION  OF  TIME  FOR  FOUR  DIFFERENT 
DISTORTIONS  FOR  WINOOW  NO.  4 (FIGURE  2.3).  WINDOW  LENGTH  = 10. 


SPECTRAL 

DISTORTION 

MEASURES 


BANDWIDTH  DISTORTIONS 


FREQUENCY  SHIFT  DISTORTIONS 


Dj^  LOG  LPC 
D2  LOG  LPC 

D.  LOG  LPC 

4 

D2  LINEAR  LPC 
D l CEF  STRUM 
d2  PARCOR 
D2  FEEDBACK 
D2  AREA 

D2  POLE  LOCATION 

Table  4 


a 



SHIFT  RATIOS 

. 99 

. 98 

. 97 

. 95 

50/49 

49/50 

10/9 

9/10 

AV  . 

. 54 

. 88 

1 . 2 

1.6 

.61 

. 58 

1 . 7 

1 . 9 

C.I. 

.13 

. 13 

. 16 

.22 

. 13 

. 19 

. 24 

AV  . 

.62 

. 94 

1 . 56 

1 . 9 

. 71 

. 68 

2 . 4 

2 . 2 

C.I. 

. 12 

. 14 

. 17 

. 2 3 

. 14 

. 13 

. 27 

. 28 

AV  . 

.83 

1.21 

1 . 8 

2 . 2 

. 94 

1.02 

3 . 1 

3 . 4 

C.I. 

. 1 3 

. 16 

. 19 

. 24 

. 16 

. 16 

. 29 

. 29 

AV  . 

2910 

3816 

4715 

6144 

3415 

2916 

6913 

6314 

C.I. 

2010 

2415 

3103 

3310 

2413 

1918 

34  12 

33  21 

AV  . 

. 75 

1 . 05 

1 . 60 

2 . 0 

. 82 

. 7 7 

1 . 96 

2 . 1 

C.I. 

. 14 

. 14 

. 19 

. 23 

. 15 

. 16 

. 3 

. 29 

AV  . 

2 . 4 

2 . 9 

2 . 9 

4 . 1 

1 . 9 

1 . 8 

4 . 1 

3.2  ,v 

C.I. 

1 . b 

1 . 5 

- - 

1 . 7 

2 . 2 

1 . 2 

1 . 0 

2 . 1 

1 . 8 

AV  . 

420 

461 

520 

850 

480 

455 

1023 

981 

225 

251 

312 

515 

310 

295 

612 

580 

AV  . 

3 . 4 

3.9 

5 . 9 

8 . 2 

3 . 3 

3 . 5 

8 . 1 

8. 1 

C.I. 

1 . 2 

1.3 

2 . 4 

4 . 2 

1 . 4 

1 . 1 

3 . 4 

4 . 1 

AV  . 

4 . 6 

4 . 9 

5. 4 

6 . 3 

4 . 8 

4 . 6 

6 . 8 

6 . 3 

C.I. 

2 . 4 

3 . 1 

4 . 1 

4 . 8 

3 . 1 

2 . 8 

4 . 4 

4 . 2 

AV . = Average  C.I.  E Confidence  Intervals 


.6  Roaults  of  the  Bandwidth  Distortions  and 
Frequency  Shift  Distortions  on  Sentences. 
All  Confidence  Intervals  are  at  the  .OS 
Levels. 


I 

4 


SPECTRAL 

DISTORTION 

MEASURES 


D1  LOG  LPC 


D2  LOG  LPC 


D„  LOG  LPC 

4 


D2  LINEAR.  LPC 


D1  CEPSTP.UM 


D,  PARCOR 


d2  FEEDBACK 


D.,  AREA 


d2  POLE  LOCATION 


BANDLIMIT  DISTORTION 
BANDLIMIT 


2.8 

2.2 

1.8 

7.5 

15.4 

16.8 

2.7 

5.8 

5.7 

6.1 

16.3 

16.9 

1.3 

7.2 

7 . 1 

8.4 

16.2 

16.8 

1.5 

6.8 

7 . 5 

8142 

9317 

9581 

2014 

2713 

2312 

5.4 

8.3 

12.4 

l 3 

2.2 

3 . 1 

7.1 

8.3 

8.9 

3.6 

3.9 

4.7 

1013 

1314 

1517 

712 

692 

851 

6.7 

7.3 

8.2 

1.3 

1.9 

2.3 

7.2 

7.7 

7.5 

4.4 

4.7 

3.9 

NOISE  DISTORTION 
S/N 


5176  C-.t>  1 ? 

2310  3412 


AV,  = Average 


C.I.  = Confidence  Interval  (.05) 


Table  2.7  Results  of  the  Bandlimit  Distortions  and 
Additive  Noise  Distortion  on  Sentences. 
All  Confidence  Intervals  are  at  the  .05 
Significance  Level. 


SPECTRAL 

DISTORTION 

MEASURES 

Dx  LOG  LPC 

D2  LOG  LPC 

d . La;  lpc 

4 

D.,  LINEAR  LPC 

CEPSTRUM 

D.,  PARCOR 
d2  FEEDBACK 
d2  AREA 

02  TOLE  LOCATION 


PITCH  DISTORTIONS 

10,1 

10,-1 

4.1 

4.-1 

AV. 

1.0 

1. 1 

.90 

.97 

C.  I . 

.12 

.31 

.22 

.24 

AV. 

.63 

1.5 

.94 

C.I. 

.11 

.09 

.10 

AV. 

m 

B91 

1.8 

1 

C.I. 

BaB 

19 

.21 

msm 

AV. 

E5 

981 

— 

1315 

C.I. 

mm 

412 

1 

640 

AV. 

w8 

1.4 

mm 

1.4 

C.  I . 

BH 

.02 

IJ1 

.03 

AV. 

2.0 

1.9 

2.3 

■ 

C.I. 

.92 

.82 

1.1 

mm 

AV. 

310 

412 

■■ 

360 

C.I. 

240 

270 

mm 

170 

AV, 

2.6 

2.4 

2.8 

C.I. 

.62 

.51 

.83 

■IB 

AV. 

3.9 

n 

C.I. 

1.9 

KS9 

AV.  = Average 

C.I.  » Confidence  Interval  (.05) 


Table  2.8  Results  of  the  Pitch  Distortion  Study  on  Vowels. 

All  Confidence  Intervals  are  at  tho  .05  Signifi- 
cance Level. 
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DISTORTION 

NON-GAIN  WEIGHTED 

GAIN  WEIGHTED 

Bandwidth 

.99 

.62 

.38 

Bandwidth 

.98 

.94 

.67 

Bandwidth 

.97 

1.56 

1.64 

Bandwidth 

.95 

1.9 

1.51 

Frequency 

Shift 

50/49 

.71 

.37 

Frequency  Shift 

49/50 

.68 

.37 

Frequency 

Shift 

10/9 

2.4 

1.92 

Frequency  Shift 

9/10 

2.2 

2.12 

Bandlimit 

2.8 

kHz 

6.1 

4.3 

Band limit 

2.2 

kHz 

16.3 

12.4 

Bandlimit 

1.8 

kHz 

16.9 

14.7 

Bandlimit 

1.4 

kHz 

17.5 

16.8 

Noise  13 

db 

1.2 

.82 

Noise  10 

db 

2.4 

] .81 

Noise  7 

db 

4.1 

3.6 

Noise  3 

dB 

6.6 

5.4 

Table  2.9  Comparison  of  Gain  Weighted  D2  Log  LPC  Spectral 
Metrics  to  Non-Gain  Weighted  D2  Log  LPC  Spectral 
Metrics. 
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results  available  from  the  PARM.  Since  many  of  the  objective  measures 
under  study  are  computationally  intensive,  the  computer  time  limited  the 
total  number  of  speech  digitization  systems  which  could  be  used  as  part 
of  the  study.  In  all,  eight  systems  were  studied,  as  shown  in  Table 
2.10.  These  systems  were  chosen  to  (1)  represent  a cross-section  of 
speech  digitization  techniques,  including  waveform  coders  (CVSD) , LPC's, 
channel  vocoders,  and  APC's,  and  (2)  these  systems  overlapped  with  the 
systems  used  in  the  development  of  a parametric  quality  test,  called  the 
"QUART"  Test  [2.24].  This  allows  some  minimal  correlation  studies  between 
the  objective  quality  measures  produced  here  and  the  parametric  results 
available  from  the  QUART  test. 

2.4.1  The  PARM  Data  Base 

The  PARM  data  base  arrived  at  Georgia  Tech  as  fourteen  boxes  of 
cards,  with  control  cards  for  processing  under  an  IBM  operating  system. 
Since  correlation  studies  require  many  accesses  of  the  data  base,  and 
since  the  accesses  are  random,  a linear  data  base  such  as  that  repre- 
sented by  the  cards  is  unacceptable.  An  acceptable  data  base  organiza- 
tion must  (1)  be  stored  in  numeric  (two's  complement)  form  rather  than 
character  form,  and  (2)  must  be  accessable  by  some  coding  scheme  which 
does  not  require  the  linear  searching  of  the  disk  based  data.  To  do 
this,  the  system  of  Figure  2.22  was  developed.  In  this  system,  a 
"MAIN  DATA  FILE"  was  organized  in  which  each  set  of  responses  for  each 
subject  is  allocated  a direct  accessable  block  of  64  sixteen  bit  words, 
60  for  the  subject's  responses  and  four  for  a label.  To  go  with  this 
main  file,  four  "POINTER  FILES"  were  developed.  The  first  pointer  file, 
the  "PARM  IDENTITY  FILE."  as  an  entry  for  each  PARM  giving  basic  PARM 
data,  such  as  systems  involved,  speakers  involved,  and  pointer  to  the 
main  data  file.  The  second  pointer  file,  the  "SPEAKER  FILE,"  has 
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HI  ANCHOR 


1. 

CVSD  - 

52-0% 

2. 

CVSD  - 

16-0% 

3. 

CVSD  - 

9.6-0% 

4. 

LPC  - 

4.8-0% 

(Lincoln 

Labs) 

5. 

LPC  - 

3.6-0% 

(Lincoln 

Labs) 

6. 

LPC  - 

2.4-0% 

(Lincoln 

Labs) 

7. 

APC  - 

0% 

8. 

PARKHILL  - 20 

db  S/N 

9. 

HY2  - 

2.4-0% 

Table  2.10  Systems  Used  in  the 
PARM  Correlation 
Study. 
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FIGURE  2.22  LAYOUT  OF  PARM  ACCESS  DATA  USED  AS  PART  OF  THIS  STUDY 
EACH  BOX  REPRESENTS  A DISK  FILE.  THE  DATA  IS  PRESORTED 
IN  THE  DATA  FILES  TO  ALLOW  EASY  ACCESS  OF  THE  PARM 
DATA  SFTS. 


information  for  each  speaker  as  to  where  each  PARM  involving  that  speaker 
is  located.  The  third  file,  the  "SUBJECT  FILE"  contains  a list,  by 
subject,  of  where  each  of  that  subject's  responses  is  located.  The 
last  pointer  file,  the  "SYSTEM  FILE"  contains,  for  each  system,  the 
location  of  all  that  system's  subjective  data. 

The  idea  behind  this  organization  is  that,  by  presorting  on  the 
information  of  potential  data  subsets  of  interest,  the  average  access 
time  for  a particular  statistical  measure  can  be  greatly  reduced. 

Hence,  a statistical  program  need  only  search  the  much  smaller  pointer 
files  for  information  rather  than  searching  the  whole  data  base. 

Further,  since  within  each  pointer  file  the  data  is  ordered  by  increasing 
PARM  number,  then  only  a minimum  number  of  accesses  of  the  main  data 
file  are  necessary  on  a particular  run. 

Two  things  should  be  noted  about  this  data  base  organization. 
First,  the  presorting  of  this  data  is  a non-trivial  computational  task, 
involving  many  hours  of  computer  sorting.  This  data  base  itself, 
therefore,  is  an  important  output  of  this  effort,  and  may  be  used  in 
the  future  for  many  classes  of  studies.  Second,  due  to  time  constraints, 
DC EC  was  unable  to  make  available  enough  information  concerning  the 
PARM  data  to  take  full  advantage  of  this  data  base.  Hence,  the 
statistical  resolving  power  afforded  by  this  data  base  is  better  than 
that  achieved  by  this  stuly.  Details  of  how  the  analysis  could  be 
improved  is  given  later  in  this  section. 

2.4.2  The  Statistical  Analysis 

The  objective  measures  used  in  this  study  are  shown  in  Table 
2.11.  The  measures  involved  are  essentially  all  the  spectral  distance 
measures  used  in  the  controlled  distortion  study  (Section  2.3)  plus 
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1. 

D1 

LOG  LPC 

2. 

D1 

LOG  LPC  GAIN  WEIGHTED 

3. 

°2 

LOG  LPC 

4. 

U2 

LOG  LPC  GAIN  WEIGHTED 

5. 

°4 

LOG  LPC 

6. 

°4 

LOG  LPC  GAIN 

7. 

°2 

LINEAR 

8. 

°2 

LINEAR  GAIN  WEIGHTED 

9. 

D1 

CEPSTRUM 

10. 

°1 

CEPSTRUM  GAIN  WEIGHTED 

11. 

°2 

PARCOR 

12. 

°2 

FEEDBACK 

13. 

°2 

AREA 

14. 

D2 

POLE  LOCATION 

15. 

°2 

ENERGY  RATIO 

Table  2.11  Objective  Measures  Used  in  the 
PARM  Correlation  Study. 
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one  additional  measure  which  has  had  some  attention  in  the  literature 
[2.38] . 

The  speech  data  used  for  this  study  was  twelve  sentences  for 
each  of  two  speakers  (LL  and  CH)  for  each  of  the  systems  of  Table  2.11. 
After  the  measures  were  applied,  the  statistical  analysis  performed  was 
identical  to  that  done  for  the  controlled  distortion  tests. 

In  the  correlation  study,  the  categories  recognized  were 
"SUBJECT"  and  "SPEAKER."  If  the  information  had  been  available  as  to 
exactly  which  sentence  was  involved  in  which  PARM,  then  "SENTENCE" 
could  have  been  a category,  increasing  the  degrees  of  freedom  by 
approximately  a factor  of  six.  The  correlation  coefficients  calculated 
were  from 

P - Z l l l Pa  2-25 

k subjects  speakers  systems 


where 


X -X  D-D 
as  a 

o,  « (-=— ->  (-4—) 


2.26 


where  "a"  is  the  condition  including  subject,  speaker,  and  system,  D 

d 

is  the  distortion  measure  for  that  system,  D is  the  estimate  of  D, 
is  the  subjects  response  to  condition  "a",  is  the  average  response 
for  that  subject  over  all  systems,  os  is  the  sample  standard  deviation 
for  the  subject  "s,"  and  0^  is  the  sample  standard  deviation  for  the 
objective  distortion  measures. 

In  order  to  understand  how  these  results  are  tabulated,  it  is 
first  necessary  to  understand  how  results  from  the  objective  measures 
can  be  used  ..o  predict  results  from  subjective  tests. 
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The  most  straightforward  wav  of  deriving  an  estimate  of  the 


subjective  quality  is  now  given.  Since  both  the  subjective  ar.d  objec- 
tive measures  for  quality  are  moans  of  a large  number  of  independent 
estimates,  then  their  marginal  probability  distribution  functions  are 
asymptotically  normal,  and,  by  the  Bivariate  Central  Limit  theorem, 
the  joint  probability  distribution  function  is  given  by  the  Bivariate 
normal  distribution: 


f (X,D) 


1 1 ,X-X, 2 2p (X-X) (D-D) 

— — exPl < <- — ) r— + 

a 2 2(1-p‘)  x JX°D 


°D 


2.27 


where  X is  the  subjective  measure,  D is  the  objective  measure,  c is  the 

X 

variance  of  the  subjective  measure,  o is  the  variance  of  the  objective 
measure,  and  ,i  is  the  correlation  coefficient.  For  this  case,  the 
minimum  variance  unbiased  estimator  of  X from  D is  given  by 


X 


X + 


(D-D) 


2.28 


where  the  variance  of  this  measure  is  given  by 


E(X 


EiX.  D 


)2  = 


oxU 


e2) 


2 . 24 


If  X,  D,  o , o , and  p were  known,  this  problem  would  be  solved,  since 

A L/ 

this  is  enough  information  to  calculate  confidence  intervals  on  X or  to 
do  null  hypothesis  testing  between  systems.  However,  estimates  for 
these  quantities,  called  X,  D,  .*  , and  p,  must  be  used  instead, 

and  these  quantities  are  random  variables  themselves.  Hence,  the  p.d.f. 
(Probability  Distribution  Function)  is  no  longer  normal,  and  is,  in 
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general,  very  difficult  to  calculate  in  closed  form. 

However,  considering  the  problem  from  the  point  of  view  of 
regression  analysis  theory  offers  additional  information.  The  form  of 
the  linear  regression  estimation  is  given  by 

X = 8X  + B2D  . 2.30 


From  the  Gauss-Markov  Theorem  [2.40],  the  least  squares  estimate  is  the  unbiased 
minimum  variance  estimate  for  X,  and  for  this  case  (this  is  really  an 
LPC  analysis) 


and 


IN  IN 

y x.d.-(  y x. ) ( y d.) 

6 3=1  33  3=1  3 1=1  3 P°X 

2 n n _ ; 

[D-d  D )2 
j=l  3 j-1  3 


3=1  3=1  J 


00x° 


2.31 


2.32 


Two  points  should  be  made  here.  First,  these  results  show  that  the 

minimum  variance unbiasedestimator  of  X from  D is  gotten  by  using  the 

minimum  variance  unbiased  estimations  for  D,  X,  o , a , and  p in 

X D 

Equation  2.28.  Second,  it  should  be  noted  that  under  a mild  set  cf 
conditions  easily  met  by  the  tests  here,  that  four  conditions  hold: 
(1)  a minimum  variance  unbiased  estimate  for  o , the  variance  in  our 

A 

approximation  of  the  subjective  quality,  is  given  by 

°X  = N-*2  J (Xi  61  82Di)  ; 

3 = 1 
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(2)  minimum  variance  unbiased  estimates  for  the  variance  in  8^  is 
given  by 


cl 


8 


1 


'2,1 
o (—  + 
X N 


r --  2 

l (X.-Xp 


2.33 


(3)  a minimum  variance  unbiased  estimate  for  the  estimate  for  S2  is 
given  by 


13  2 ~ N 


y (x.-x)2 


i- 1 


2. 34 


and  (4)  the  estimates  for  8,  and  8*  (S,  and  8,)  are  normal  distributed, 
"22  '22  '^2  2 

formed  from  o /o  , o.  /c  , and  ot  /o  are  x distributed,  and  all  five 

A •.  X « A 

sl  ^2 

estimates  are  independent.  These  four  points  give  all  of  the  statisti- 
cal power  necessary  to  do  all  the  hypothesis  testing  and  confidence 
interval  estimation  which  is  normally  associated  with  statistical 
testing  and  estimation.  For  example,  if  a confidence  interval  for  8^ 

was  desired,  it  is  only  necessary  to  note  that  (S  - 8,/c.)  is  t 
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distributed,  and  the  confidence  interval  is  given  by 


Ua(N-2)°; 


31  - 


L <N-2)" 


2.35 


where  U , _ and  L . are  tlie  upper  and  lower  significance  limits  for 

aN-2  aN-2 

a t distributed  ( p = 0,  o = 1)  for  N-2  degrees  ot  freedom  and  probability 

a . 


There  are  really  two  questions  which  these  tests  seek  to  answer. 


First,  ass 


I that  che  estimates  we  have  for  correlations,  moans,  and 


variance  are  exactly  correct,  what  would  then  be  the  confidence  intervals 
on  our  estimates  of  X?  This  question  seeks  to  ascertain  the  potential 
of  the  objective  measures  used  here  to  predict  subjective  results. 

Second,  considering  all  the  distorting  factors  in  our  analysis,  especial- 
ly our  errors,  in  estimating  0^  and  02>  what  then  is  the  resolving  power 
of  our  test?  These  questions  address  the  usable  resolving  power  of 
subjective  acceptability  estimates  based  on  the  analysis  performed  "so 
far.  The  answer  to  the  first  question  can  be  addressed  by  applying 
equation  2.29  to  the  estimate  of  the  correlations  Equation  2.25)  of 
the  correlation  coefficients.  The  answer  to  the  second  question  can  be 
observed  by  applying  equation  2.32  to  the  data. 

2.4.3  The  Experimental  Results 

The  correlation  studies  described  above  were  carried  out  on 
three  sets  of  the  data:  all  the  systems;  only  the  vocoder  systems 
(IJ’C  and  channel  vocoders);  and  only  the  waveform  coders.  The  results 
for  the  three  studies  are  given  in  Tables  2.12,  2.13,  and  2.14, 
respectively.  Several  points  should  be  made  here . First,  the  correla- 
tion coefficients  for  a number  of  measures  are  quite  nigh,  some  as  hiqh 
as  .83.  The  "BFST"  measures  seem  to  be  gain  weighted  spectral  distance 
measures,  as  expected.  Second,  however,  note  that  the  estimated 
standard  deviations  are  somewhat  larqer  than  desirable.  This  indicates 
that  more  •'  >ta  should  be  used  to  better  establish  these  results.  Third, 
note  that  much  better  results  are  obtained  for  the  small  subclasses  than 
for  the  whole.  This  indicates  that  these  measuies  work  best  if  the 
systems  being  test'd  arc  preclaHsi f ied  according  to  tin  type  of 
distortion  expected. 
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SPECTRAL 


DISTORTION 

MEASURES 

P 

°el 

6 

e 

Dx  LOG  LPC 

-.76 

10.24 

22.24 

Dx  LOG  LPC  GAIN  WEIGHTED 

-.79 

8.13 

16.13 

D2  LOG  LPC 

-.78 

8.85 

16.71 

D2  LOG  LPC  GAIN  WEIGHTED 

-.81 

7.21 

13.3 

D LOG  LPC 

4 

-.73 

14.31 

24.12 

D LOG  LrC  GAIN  WEIGHTED 

4 

-.78 

8.31 

16.3 

D2  LINEAR  LPC 

-.61 

17.21 

30.9 

D2  LINEAR  LPC 

- .66 

13.21 

27.1 

Dr  CEPSTRUM 

-.79 

7.64 

14.91 

Dl  CEPSTRUM  GAIN  WEIGHTED 

-.81 

6.98 

13.91 

d2  PARCOR 

-.55 

22.1 

40.7 

D2  FEEDBACK 

-.23 

37.1 

61.2 

d2  AREA 

-.76 

12.41 

21.6 

D2  POLE  LOCATION 

-.25 

21.6 

40.7 

D2  ENERGY  RATIO 

+ .78 

9.2 

18.3 

p ■»  Correlation  estimate 

o _ Ideal  standard  deviation  estimate  (assuming  p=p) 
el 

n = Standard  deviation  estimate  (full  statistics) 
e 


Table  2.12  Results  of  Correlation  Study 
For  Total  Set  of  Systems 
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SPECTRAL 

DISTORTION 


MEASURES 

- 

P 

0 

a 

el 

e 

Dx  LOG  LPC 

-.79 

8.13 

14.23 

D1  LOG  LPC  GAIN  WEIGHTED 

-.81 

7.15 

12.2 

D2  LOG  LPC 

-.79 

8.27 

18.3 

D2  LOG  LPC  GAIN  WEIGHTED 

-.83 

6.63 

13.4 

D LOG  LPC 

4 

-.77 

8.95 

18.1 

D LOG  LPC  GAIN  WEIGHTED 

4 

-.81 

7.29 

14.9 

D2  LINEAR  LPC 

-.70 

16.31 

31.6 

D2  LINEAR  LPC  GAIN  WEIGHTED 

-.74 

14.52 

28.4 

Dx  CEPSTRUM 

-.81 

7.52 

13.72 

Dx  CEPSTRUM  GAIN  WEIGHTED 

-.83 

6.81 

13.14 

C2  PARCOR 

-.61 

18.22 

34.31 

d2  FEEDBACK 

-.33 

29.  2 

43.21 

U,  AREA 

-.78 

10.21 

21.21 

D2  POLE  LOCATION 

-.36 

36.3 

61.3 

D2  ENERGY  RATIOS 

+ .80 

7.82 

14.9 

P = Correlation  estimate 

°eX  = Ideal  standard  deviation  estimate  (assume  p-p) 
oe  = Standard  deviation  estimate  (full  statistics) 

Table  2.13  Results  of  Correlation  Study 
Using  Only  Vocoders. 


69 


* 


SPECTRAL 

DISTORTION 

MEASURES 

D1  LOG  LPC 

Dx  LOG  LPC  GAIN  WEIGHED 
D2  LOG  LPC 

D2  LOG  LPC  GAIN  WEIGHTED 

D LOG  LPC 

4 

D LOG  LPC  GAIN  WEIGHTED 
4 

D2  LINEAR  LPC 

D2  LINEAR  LPC  GAIN  WEIGHTED 
Dx  CEPSTRUM 

D,  CEPSTRUM  GAIN  WEIGHTED 

D2  parcor 

D2  FEEDBACK 
d2  AREA 

d2  POLE  LOCATION 
D2  ENERGY  RATIO 


A 

P 

°el 

0 

e 

-.79 

8.23 

14.12 

-.80 

7.91 

13.98 

-.78 

9.41 

18.91 

-.82 

6.78 

12.21 

-.76 

12.2 

24.31 

-.80 

7.98 

18.32 

-.73 

14.23 

29.31 

-.75 

12.9 

26.21 

-.79 

9.21 

18.51 

H 

CO 

1 

6.91 

12.91 

-.58 

27.4 

42.95 

-.21 

40.2 

51.2 

-.74 

18.4 

40.91 

-.31 

29.6 

51.9 

+ .76 

16.3 

33.6 

p * Correlation  estimate 

ogI  » Ideal  standard  deviation  estimate  (assuming  p-p) 

o * Standard  deviation  estimate  (full  statistics) 
e 

Table  2.14  Results  of  Waveform  Coder  Using 
Only  Waveform  Coders 
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These  are  certainly  encouraging  results.  With  measures  as 


highly  correlated  as  these,  there  is  good  expectation  of  creating  a 
viable  objective  quality  test.  However,  the  relatively  large  estimated 
standard  deviations  in  the  estimates  which  include  all  statistics 
indicate  more  data  must  be  processed  to  increase  the  resolving  power 
of  these  tests  to  a maximum, 


2. 5 Summary  and  Areas  for  Future  Research 

The  major  results  of  this  study  can  be  summarized  as  follows. 

(1)  A number  of  objective  quality  measures,  particularly 
spectral  distance  metrics,  offer  considerable  promise  in  predicting 
subjective  quality  results. 

(2)  Some  of  the  measures  tested  are  clearly  better  than  the 
others.  The  best  are  the  gain  weighted  log  LPC  spectral  distance 
measure  and  the  gain  weighted  cepstral  measure.  These  two  measures 
are  highly  correlated  with  each  other  12.35). 

(3)  Several  measures  do  consistently  poorly.  Two  of  these  are 
the  feedback  coefficient  measure  and  the  pole  location  measure. 

The  pole  location  measure  'would  probably  improve  if  some  sort  of  formant 


extraction  was  attempted. 

(4)  The  D2  area  measure  did  quite  well.  This  is  interesting 
since  it  is  so  computationally  compact. 

(5)  Gain  weighting  gave  a slight,  but  consistent,  improvement 
in  the  subjective-objective  correlations. 

(6)  Based  on  the  values  of  *j  obtained  in  this  study,  the 

potential  for  using  several  of  the  measures  for  predicting  subjective 
scores  is  good.  However,  it  should  be  noted  that,  even  if  the 

resolving  power  of  these  tests  falls  short  (by  approx imatc ly  a power 
cf  2-2.5)  of  the  subjective  tests  themselves.  Howovc*-,  subject  ive  and 
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objective  measures  may  be  combined  to  improve  resolution.  This  is 
easily  done  so  long  as  the  number  of  subjective  tests  used  warrants  the 
use  of  the  Bivariate  Normal  Distribution. 

(7 ) The  resolving  power  of  the  actual  tests  which  resulted  from 
this  study  are  nowhere  near  as  good  as  the  "potential"  resolving  power. 
This  is  because  the  resolving  power  of  the  tests  in  this  study  on  p 
was  not  good  enough.  This  could  be  improved  by  doing  a lower  level 
correlation  between  a subject's  response  and  the  objective  measure  for 
the  exact  sentence  used,  and  by  using  a larger  protion  of  the  PARM  data 
base  as  part  of  the  study.  It  should  be  noted,  however,  that  although 
it  is  interesting  to  speculate  on  the  improvement  in  the  estimates  of 
p that  further  testing  would  accomplish,  no  results  should  be  assumed 
until  the  testing  is  complete. 

The  results  of  this  study  offer  a number  of  areas  for  future 
research.  Some  of  these  are  listed  below. 

(1)  An  obvious  extension  to  this  study  would  be  to  extend  the 
portion  of  the  PARM  data  base  used  in  this  study.  This  might  well 
improve  its  estimates  for  p. 

(2)  Statistically  improved  results  may  also  obviously  be 
obtained  by  finding  measures  which  are  more  highly  correlated  with  sub- 
jective results.  One  approach  is  to  simultaneously  attempt  to  better 
understand  the  parametric  factors  involved  in  human  quality  acceptance, 
as  has  been  attempted  in  the  "QUART"  and  "DAM"  tests,  and  to  develop 
objective  measures  which  are  highly  correlated  with  the  important 
parametric  subjective  measures. 

(3)  Improvements  are  possible  in  the  particular  objective  measures 
used  in  the  correlation  studies.  For  example,  Makhoul  f 2 . 13]  suggests 
several  forms  of  frequency  weighting  in  I.PC  spectral  distance  measures 
which  might  be  used  to  improve  subjective-objecti vo  correlation. 
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CHAPTER  3 


SUBJECTIVE  PREDICTION  OF  USER  PREFERENCE 

3.1  Introduction 

A crucial  issue  in  the  design  and  implementation  of  a digital 
voice  conmunication  system  is  the  prediction  of  user  acceptability. 

Even  if  the  many  other  system  design  criteria  are  resolved  and  a good 
engineering  solution  found,  the  system  will  foil  unless  people  use  it. 
People  will  use  it  only  if  they  find  it  highly  acceptable  on  the  basis 
of  their  current  telecommunications  alternatives. 

Speech  testing  has  been  categorized  as  quality  testing  or 
intelligibility  testing.  The  term  preference  testing  or  acceptability 
testing  really  supercedes  both  terms,  not  as  a replacement  for  either, 
but  as  a combination  of  the  essential  features  of  each.  That  is, 
preference  is  assumed  to  be  based  on  a sufficient  combination  of  quality 
and  intelligibility  to  determine  relative  user  acceptability.  It  must 
be  recognized  here  that  100%  intelligibility  may  be  yet  of  unacceptable 
quality  and  hence  of  low  preference,  just  as  pleasant  but  unintelligible 
speech  is  of  low  preference. 

Just  as  with  quality  and  intelligibility  testing,  preference 
testing  can  be  implemented  with  a wide  variety  of  strategies  or 
methodologies.  The  test  may  be  subjective,  objective,  parametric, 
isometric,  based  on  absolute  or  relative  scales,  with  an  infinite 
variety  of  organizations.  Fortunately,  much  work  has  been  done  in  the 
testing  of  speech,  so  that  we  do  not  need  to  begin  from  scratch. 

In  this  chapter  we  will  consider  subjective  testing.  Objective 
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testing,  another  phase  of  this  effort,  is  considered  in  Chapter  2. 

3. 2 Subjective  Testing  Philosophies 

Subjective  testing  procedures  are  based  on  drawing  from  a 
population  of  potential  system  users,  i.e.  subjects  their  reaction  to 
the  speech  produced  by  a digital  speech  transmission  system.  These 
reactions  must  be  quantified  somehow  and  are  then  averaged.-  or  processed, 
according  to  established  statistical  principles  to  arrive  at  a measure 
of  user  acceptance  or  preference.  The  basic  testing  philosophies  can 
be  listed  as  follows: 

Iso- Preference  Testing  - involves  the  use  of  a known,  agri ed 
upon  reference  signal  condition  for  use  as  a comparison  in  judging  an 
unknown.  The  agreed  upon  conditioning  must  be  parameterized  so  that 
the  unknown  or  test  signal  can  be  found  equally  acceptable  to  an 
adjustment  of  the  parameter  set.  This  procedure  then  yields  the 
judgement  that  a given  signal  is  as  acceptable  as  some  reference 
condition. 

Relative  Preference  Testing  - involves  comparisons,  done  inde- 
pendently, with  each  of  several  reference  conditions.  The  reference 
conditions  are  used  to  establish  a scale  of  preference,  and  an  unknown 
signal  can  then  be  ranked  on  this  scale.  The  subjective  scale  of  the 
references  must  be  agreed  upon  a priori. 

Absolute  Preference  Testing  - methods  require  the  subjects 
performing  the  test  to  give  an  absolu^  numerical  evaluation  to  the 
properties  described  in  the  test  format.  Properties  tested  can  be 
selected  to  describe  various  features  of  interest. 
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Isometric  Testing  for  user  preference  calls  for  a direct  evalua- 


tion of  preference  from  the  test  subjects.  Each  subject  makes  his 
evaluation  against  ti.2  background  of  his  total  experience  and  personal 
biases,  and  including  any  local  or  instantaneous  bias  with  fatigue  or 
irritability  effects  built  into  his  response. 

Parametric  Testing  asks  the  test  subject  to  make  judgements  with 
respect  to  specific  features  of  the  speech  signal  under  consideration. 

The  test  format  has  then  the  flexibility  of  later  weightings  of  feature 
judgements  to  achieve  a measure  of  acceptability  which  is  more  independent 
of  the  individual  subject's  biases.  The  appropriate  weightings  must  be 
agreed  upon  in  the  final  resolution  of  test  data  however. 

The  most  recent  application  of  these  philosophies  has  resulted 
in  the  PARM  test  and  the  QUART  test  J3.1]  and  more  recently  in  the  dam 
test  (3.2). 

In  the  PARM  tesc  (Paired  Acceptability  Rating  Method)  an  iso- 
metric approach  in  used.  However,  since  systems  being  tested  are 
presented  to  the  subjects  in  a carefully  chosen  ordering,  paired 
comparisons  can  be  abstracted  from  the  test  results  or  on  a posteriori 
basis.  To  reduce  the  effects  of  extremes  of  responses  typical  in 
isometric  testing,  the  listerns  are  asked  to  judge  two  reference  or 
anchor  conditions,  one  "good1'  and  one  "bad"  anchor.  Anchor  responses 
are  then  used  to  normalize  other  responses  within  and  across  listeners. 
Details  of  the  testing  organization  and  exhaustive  analysis  of  results 
are  found  in  (3.1) . 

In  the  QUART  test  (Quality  Acceptance  Rating  Test)  the  parametric 
philosophy  is  followed,  with  an  isometric  measure  of  overall  acceptability 


included  as  well.  The  listener  is  asked  to  score  each  system  under 
test  with  respect  to  a family  of  features  and  to  give  his  overall  reac- 
tion. Extensive  analysis  of  this  approach  is  also  well  documented  (3.11. 

An  outgrowth  of  the  background  of  subjective  testing  of  speech 
in  general  and  of  experience  with  PARM  and  QUART  in  particular,  after 
substantial  further  requirement  in  the  choice  of  a family  of  features  to 
use  in  direct  response  solicitation,  is  the  DAM  test  (Diagnostic 
Acceptability  Measure) . 

The  DAM  test  acquires  ratings  on  perceptual  features  which  have 
been  selected  after  extensive  experience  with  QUART  as  those  features 
closely  correlated  with  overall  acceptability,  nearly  orthogonal  to  each 
other,  and  directly  related  to  specific  system  functions  or  to  system 
operating  environment  conditions.  tn  addition,  the  feature  set  thus 
extracted  is  small  enough  to  allow  efficient  and  reasonable  subjective 
testing  to  be  accomplished.  The  DAM  test  is  still  evolving,  but  is 
nearing  a final  form.  Although  it  is  not  yet  documented  in  the  litera- 
ture, the  test  has  been  the  subject  of  substantial  interaction  between 
the  speech  research  group  at  Georgia  Tech  and  the  group  at  Dynastat. 

These  discussions  have  been  conducted  in  v.-sits  by  A,  M.  Bush  and  T.  P. 
Barnwell  to  Dynastat  and  by  W.  D,  Voiers  to  Georgia  Tech.  A detailed 
description  of  the  DAM  test  is  included  as  Appendix  A of  this  report. 

3.  3 Statistical  Testing  Procedures 

In  subjective  testing,  as  mentioned  earlier,  an  essential  aspect 
of  the  test  implementation  is  the  statistical  processing  of  the  data, 
i.e.  responses  from  listeners  or  subjects,  to  obtain  an  average  rating 
of  the  system  or  system  feature  under  test.  Even  though  the  field  of 
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statistics  is  well  documented,  both  in  the  scientific  literature  and  in 
textbook  and  reference  book  formats,  it  is  our  feeling  that  some  expo- 
sition here  may  be  worthwhile.  Our  point  of  view  (necessarily!)  is 
that  of  the  communications  engineer  with  a background  in  probability, 
random  variables,  and  stochastic  processes,  who  feels  he  should  there- 
fore know  all  about  statistics  until  he  reads  a little  in  the  area. 

In  order  to  apply  statistics  to  the  results  of  subjective  testing, 
one  must  either  base  the  statistics  on  assumptions  regarding  the  under- 
lying distributions  of  the  individual  listener  responses,  the  parametric 
approach,  or  assume  that  these  underlying  distributions  are  unknown  and 
work  within,  for  example,  ranking  statistics,  the  nonparametric  approach. 
The  par  June trie  approach  is  treated  from  a theoretical  approach  in  many 
places:  our  favorites  are  Wilks  (3.3  ),  and  Cramer  [ 3.4).  The  non- 
parametric approach  is  also  extensively  treated,  but  our  favorite  here 
is  Hajek  ( 3.5).  For  applications  with  a minimum  of  theory,  a good 
reference  among  a great  many  possible  choices  is  Winer  [ 3.6  ] or  Siegel 
[3.7  ] for  parametric  or  nonparametric  tests,  respectively. 

In  the  parametric  approach,  the  most  common  assumption  regarding 
the  distribution  of  the  listener  responses  is  that  they  are  all 
Gaussian.  Hypotheses  with  respect  to  common  means  and/or  variances 
under  test  conditions  can  then  be  set  up  and  inferences  drawn  by 
comparisons  with  standardized  tables. 

3.3.1  Distributions 

The  key  distribution?  «x-e  summarized  below  for  convenience. 
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Chi  rquare 

let  , i=l,...,n  be  independent,  identically  distributed 
Gaussian  random  variables,  each  with  zero  mean  and  unit  variance.  Then 
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(3.3.1) 


is  a new  random  variable,  with  a distribution  called  Chi-square  with  n 
degrees  of  freedom.  The  probability  density  function  is  given  by 
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F-Distribution 


let  X X , and  Y,,...,Y  be  n+m  independent,  identically 
1 n l nt 

distributed  Gaussian  random  variables  each  with  zero  mean  and  unit 
variance.  Then  the  ratio 
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is  a random  variable  with  a distribution  called  the  F-distribution,  with 
parameters  m and  n.  The  probability  density  function  is 
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Then  t is  a random  variable  which  has  a distribution  called  the  Student's 
distribution  with  parameter  n.  The  probability  density  function  is 
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(3.3.6) 


Studentized  Range  Statistic 

Let  X^, . . . iX^  bo  independent  identically  distributed  Gaussian 
random  variables  each  with  zero  mean  and  unit  variance.  Define  a 
random  variable  Z as 
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<3.3.  n 


as  shown  in  Fiqure  3.1.  The  probability  density  function  of  is 
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(3.3. M ) 


where  k (•)  is  the  oauss i an  cumulative  distribution  function  and  f {') 

X x 

is  the  Gaussian  probability  density  function,  both  for  y.crn  mean,  unit 
variance  Gaussian  random  variables.  This  function  is  not  available  in 
closed  form  unless  k~ 2 . Gome  points  of  the  cumulative  distribution 
function  for  7.  have  been  tabulated.  See  for  example  the  tables  of 
Winer  [ 3.l>).  For  a derivation  of  (t.3.8),  see  Appendix  B of  t his  rejxjrt , 

3.3. 2 Estimation 

We  consider  now  some  commonly  used  estimates  of  statistical 
parameters. 


Mean 


bet  X^,...,X  be  independent  identically  distributed  random 
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variables  each  with  moan  |i  and  variance  e*-.  Then 


l 


n 


x = - y x. 
n i-i  1 


( 1.  3.‘») 


is  called  the  sample  moan . It  is  an  unbiased  estimate  of  the  mean  of 
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the  x. ' s: 
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E ( x ] = u 
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Var(x)  = — 
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(3.3.10) 

(3.3.11) 


Variance 


For  X, ,...,X  independent  identically  distributed  random 
l n 

variables,  each  with  mean  p and  variance  o2,  the  sample  variance 
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(3.3.12) 


is  an  unbiased  estimate  of  the  variance  of  the  x^'s,  with 

E(s2]  = o2  (3.3.13) 

Var  Is2]  = -(u.  - ^ o4)  (3.3.14) 

n 4 n-1 


where  u„  denotes  the  fourth  central  moment.  If  the  x.'s  are  Gaussian 
4 i 

— 2 

as  well,  then  x and  s are  best  mean  square  estimates  and  are  independent 
random  variables.  Also,  in  this  case, 


*/n  ( X-  u ) 


(3.3.15) 


is  a random  variable  with  the  student's  distribution  with  (n-1)  denrevs 
of  freedom. 
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3,3.3  Analysis  of  PARK  Data 


* ' 
r 


As  an  exan\ple  of  the  application  of  the  above  results,  lot  us 
consider  the  problem  of  analysis  of  the  PARK  data.  Let 

Rijk  " the  re8P°na<  of  listener  i to  system  j 
on  the  presentation 

For  a particular  FARM  module  of  data,  we  have 

1 >,  i s 1,  ■ the  number  of  listeners  in  the  module 

1 n j M - the  number  of  systems  in  the  module 

1 k 10S  - T - the  number  of  times  a system  is 

presented  in  a module,  where  S 
is  the  number  of  speakers  in  the 
module. 

For  example,  L-10,  M-6  including  anchors,  S-3,  T» 30  might  be  a sot  of 
parameters,  with  1000  total  responses  in  the  module. 

Let 
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Then,  combining!  results,  we  have 


o • 


M-1  '■«  M(lT-l)  •- 

0 * f O • 


total  Ml.T-l  sys  MI.T-1  error 


, T * •*  * \ 

t .1  ) 


Now,  if  tv  = o*  , that  is,  if  the  different  systems  themselves 
sys  error 

contribute  no  systematic  differences  to  the  variance,  then 


<'•  , ■=  0 ‘ + e •' 

total  sys  error 


(3.3. 1’h ) 


The  F-test  is  used  to  test  the  hypothesis  that  a-  = 0“  , by  forminq 

sys  error 

the  ratio  of  these  variables,  assuming  that  the  daussian  assumptions  hold, 
and  utilir.inq  tabulations  of  t he  eumulat ive  distribution 'of  the  F 
variable  under  the  hypothesis.  If  the  ratio  is  outside  predetermined 
bounds,  t ho  test  is.  said  to  hold,  that  is,  the  two  variances  are  not 
the  same.  Otherwise,  there  is  no  conclusion.  From  the  (xiint  of  view 
of  statistical  hypothesis  testing,  we  test  the  hypothesis  (systems 
contribute  no  systematic  difference!.  Tf  F is  too  larqe,  we  reject  the 
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hypothesis.  This  amount  s tv'  eons  idei  i na  t ho  hypot  lies  t s aaainst  a 
specified  false  alarm  piohability,  and  not  aivina  anv  ct  her  measure  of 
per formanee . 

l-'or  a eompat  t son  between  pan  s ot  iiumii:;  , one  cari  n:.e  the  student  i 
aed  ranae  statistic  as 


q 


a , K , f 


( i.  i,  CO 


where  a is  the  desired  quantile  point  ot  the  cumulative  d i st  r i but  i on 

ot  the  statistic,  K ■ ( i ) - ( i ’ ) * 1 , ic  M,  is  the  nnmbet  of  steps  between 

the  K ’ s be  i na  compared  when  all  the  K.’s  ate  rank  ordered,  and 
1 1 ' , 

t M 1 1 tT- 1 ) - dearees  of  freedom  of  wlu'n  { hi  s tes*  is  oraanir.ed 

in  matrix  form  to  facilitate  the  comparison  ot  all  means  tot  s iani t i cant e 
of  differences  between  pairs-  of  means  to  level  a of  talse  alarm,  the 
procedure  is  called  the  Newmnn-Keuls  test.  (bee  Winer  !'t.h  ! pp.  bO-b  1 ) . 

1 . .1  . -1  Nonpar amet  r i ejl’es  t s- 

In  nonpa  rattle  1 1 i c t er-t  ina,  one  declines  to  .issuuui  th.it  the  utulet  - 
lymu  statistics  are  Omission.  Then  one  tanks  the  responses  corros- 
pondina  to  their  relative  maanitude  either  ssmed  or  utv  ianed.  If  the 
conditions  hypot lies  i c.ed  atvc  no  systemat  t c .1  i t t et  en.'t'--  in  t espouses  , the 
rankinus  will  be  purely  random,  result ina  in  statistics  which  for  two 
conditions  may  he  derived  fairly  easily.  Common  two  dimensional  non- 
-’arametric  tests  result  ina  t tom  various  rank  i tut  ptooedutes  are  the 
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Wilcoxon  tost  , t he  Median  test,  the  Van  det  Warden  t e:;t  , ami  the 
Ko  Imoaot  ov-Smi  rnov  test.  llajeck  | l • *'  1 describes  each  of  these  tests 
and  «.jivos  underlying  statistics  for  which  each  test  is  most  powerful  . 
Unfortunately,  no  uniformly  most  powerful  test  exists.  in  situations 
where  under lyinq  distributions  may  reasonably  be  as*:umed  to  be  datissian, 
a patamettic  t es  t will  in  ueneral  be  best  . 

Nonpatamet r ic  tests  comparittq  more  than  two  conditions  are  more 
difficult  to  compose  than  tin'  comparisons  of  pairs  of  condit.  ions  as 
all  tin'  rank  statistics  are  in  the  hiohet  order  case  derived  from 
multinomial  as  oppose*!  to  binomial  type  distribution:;.  Althouqh  some 
references  are  made  to  such  ptoceudres  in  l.ehmati  1 1,  e.u.  the 

K ruska l -Wal l is  test,  no  convenient  generally  acct'pt ed  mu l t idimensional 
nonpar amet tic  tests  were  found. 

' . -1  done  1 us t ons  and  Kocommondat  tens? 

Ttie  followitio  roru-lusions  reward  inn  subjective  prediction  of 
user  preference  ate  drawn  primarily  on  t ht'  basis?  of  data  av.ri  table 
from  the  analysis  of  the  results  of  the  PARM  and  i^HtAKT  tests,  1 1.11, 

t tom  t!u'  discu:?stons  at  liivtoia  Tech  and  at  Pynastat  with  W.  b.  Voiers? , 
and  from  tin'  initial  res.ults  of  t tie  HAM  test. 

1 . ! . I l s'.omet  r i c Tes'.t 

In  isometric  test:;  such  as;  t'ARM,  the  ab:;ahiti'  t.inkinus;  of  systt'ni 
conditions  by  individual  listeners',  will  have  a hiah  variance  due  to 
individual  lis-.tener  i d iosync  t as;  r es  and  i nt  ra  1 i st  enet  :■  variability,  in 
addition  to  i nt  et  l i s.t  et  net  variability.  Althouah  t he.se  eflects;  can  be 
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balanced  out  by  extremely  careful  post-test  processing  of  responses  to 
establish  common  origins  and  scales  within  and  across  listeners.  Such 
processing  is,  inevitably,  subject  to  some  criticism,  as  any  smoothing  of 
the  data  will  also  introduce  some  distortion  of  one  kind  as  it  reduces 
other  effects.  Smoothing,  centering,  and  scaling  was  accomplished  in 
the  PARM  tests  based  on  the  ratings  and  relative  ratings  of  the  anchors. 
Although  more  efficient  anchoring  and  normalisation  procedures  can 
clearly  be  devised,  such  tests  will  always  suffer  from  high  variability 
and  hence  require  large  groups  of  listeners  and  many  trials  and  will 
always  be  subject  to  criticism  due  to  post  test  normalization  procedures. 

3.4.2  Tests  of  Features 

In  order  to  devise  an  effective,  efficient  and  reliable  subjective 
test,  it  is  necessary  to  narrow  the  scope  of  the  question  asked  the 
system.  That  is,  a more  specific  response  than  "Do  you  like  this?"  must 
be  solicited.  If  the  features  of  the  speech  which  are  perceptually  most 
important  in  determining  the  overall  user  acceptabi li ty  can  be  identified 
and  quantified,  than  one  can  construct  an  acceptability  rating  with  less 
variability  within  and  across  listeners. 

This  then  becomes  a problem  of  feature  extraction.  Two  fronts  or 
approaches  to  this  problem  can  be  found:  (a)  List  all  the  conceivable 
descriptions  of  features.  Test.  Analyze  the  data  with  correlation 
analysis  and  try  to  find  the  features  which  are  important  empirically. 

(b)  Based  on  extensive  experience  with  various  systems,  select  the  most 
typical  types  of  noises  and  degradations.  Try  to  solicit  responses 
along  these  particular  features.  Include  effects  of  the  environment  such 
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as  background  noises.  Feature  selection  using  method  (a)  was  used  in 
QUART.  Subsequent  refinement  using  the  ideas  of  (b)  as  well  have  led 
to  the  parameter  sets  of  DAM.  It  is  our  judgment,  based  on  the  results 
of  DAM,  that  the  best  available  subjective  preference  testing  procedure 
available  now  is  DAM.  It  should  be  f'ointed  out  that  until  the  extensive, 
expensive,  detailed  test  results  of  FARM  and  QUART  it.  was  not  possible 
to  draw  this  conclusion;  however,  the  detailed  agreement  of  FARM  and 
QUART,  and  the  subsequent  development  of  DAM  leave  no  other  conclusion. 

.1 . 4 . .1  Implementation  of  Subjective  Tests 

The  monumental  and  time  consuming  tasks  of  conducting  a subjective 
listening  test  can  effectively  be  implemented  for  improved  speed  and 
efficiency  by  using  an  interactive  computer  to  control  the  test,  collect 
the  data,  and  subsequently  to  analyze  the  test  data. 

3.4.4  S ize  of  the  Test 

The  numbers  of  listeners  which  must  K'  used  in  a subjective 
testing  procedure  can  be  determined  only  after  sufficient  data  is 
accumulated  with  a particular  test  methodology  or  algorithm  to  permit 
good  estimation  of  the  error  variances.  Then  the  number  of  responses 
must  be  selected  to  give  an  adequate  lesolution  of  the  data  to  separate 
systems  under  test.  Note  that  the  required  resolution  also  will  depend 
on  how  different  the  systems  to  be  resolved  are  on  the  scale  of  interest. 

3 . 4 . 5 Speaker  Selection 

The  number  of  speakers  has  been  found  in  QUART  and  FARM  to  be 
less  significant  than  previously  thought,  from  the  point  ot  view  ot 
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statistical  resolving  power.  However,  from  the  point  of  view  of  system 
design,  it  is  clear  that  some  systems  will  be  highly  biased  toward  low 
pitched  speech  or  moderately  pitched  speech,  and  perform  quite  poorly 
on  high  pitched  speech  or  vice-versa.  Hence,  it  is  considered  essential 
to  use  at  least  two,  preferably  three,  speakers  chosen  to  cover  the 
expected  range  of  pitches.  This  strategy  will  at  lea9t  isolate  quickly 
systems  which  will  not,  for  example,  respond  to  a female  voice. 

1 . 4 . e Overall  Recommendations  for  Subjective  Tests 

The  overall  1 ecommendat ion  to  come  from  this  examination  of 
subjective  tests  and  test  facilities  is  the  development  of  an  interactive 
computer  based  hardware  facility  for  conducting  a refined  version  of  the 
DAM  test. 
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4,  A SUBJECTIVE  COMMUNICABILITY  TEST 


4.1  Introduction 

When  judging  the  performance  of  highly  intelligible  speech  com- 
munications systems,  one  approach  is  to  apply  an  isometric  subjective 
user  acceptability  test,  such  as  the  PARM.  The  hypothesis  in  such 
tests  is  that  subjects  can  judge,  from  listening  to  speech  segments 
played  through  the  systems  being  tested,  the  overall  expected  accepta- 
bility of  a system.  The  problem  with  these  tests  is  that  the  subjects' 
responses  represent  a noisy  measure  of  the  actual  acceptability  of  a 
system.  In  this  context,  the  "ACCEPTABILITY"  of  a system  is  defined  as 
the  level  to  which  complex  communication  tasks  can  be  accomplished 
while  using  the  system. 

A model  which  states  the  problem  more  clearly  is  one  which 
postulates  a fixed  cognitive  resource  available  to  a user  of  a communi- 
cation system.  As  was  discussed  in  Chapter  2,  due  to  the  multiplicity 
of  acoustic  cues  for  segmental  and  supersogmcntal  features  in  speech, 
and  due  to  a listener's  immense  knowledge  of  the  phonemics,  syntactics, 
and  semantics  of  his  language,  a listener  may  well  be  f-ble  to  under- 
stand speech  which  is  very  distorted.  The  problem  is  that  to  do  so,  he 
must  utilize  a large  portion  of  his  cognitive  resource  to  just  under- 
standing what  is  being  said.  For  a low  quality  system,  therefore,  this 
leaves  him  relatively  less  cognitive  resource  to  apply  to  the  communi- 
cation task,  making  the  communication  task  more  difficult. 


97 


The  definition  of  a "COMMUNICABILITY  TEST,"  as  used  in  this 


chapter,  is  any  test  which  trys  to  measure  a user  performance  on  a 
communication  task  while  using  a communication  system.  The  idea  is  to 
design  tests  in  which  users  are  not  asked  to  rate  systems,  but  rather 
are  asked  to  perform  some  task  in  which  the  subjects'  performance  may 
be  measured  objectively-  In  order  to  be  an  acceptable  communicability 
test,  therefore,  the  test  must  meet  several  requirements.  First,  the 
communication  task  must  be  difficult  enough  so  that  a subject  is  using 
most  of  his  cognitive  resource  in  performing  the  task  even  with  no 
system  distortion.  Second,  a subject's  performance  on  the  task  must  be 
easy  to  measure.  Third,  the  test  must  be  inexpensive  to  administer 
because  it  has  enough  inherent  resolving  power  to  differentiate  among 
the  communications  systems  without  eecessive  subject  costs.  Last,  the 
test  should  not  require  the  actual  use  of  a communication  system  in  the 
test,  so  that  simulated  systems  may  also  be  tested. 

This  chapter  describes  the  dasign  and  testing  of  one  such 
communicability  test.  Section  4,2  describes  the  design  of  the  automated 
subjective  data  acquisition  system  used  to  administer  the  test.  Section 
4.3  describes  the  details  of  the  test  itself.  Section  4.4  describes  the 
data  analysis  done  in  the  test.  Section  4.5  describes  the  test  results. 
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4.2 


An  Automated  Speech  Subjective  Quality  Testing  Facility 


One  of  the  greatest  sources  of  expense  in  performing  subjective 
speech  quality  tests  is  the  large  amount  of  manual  data  handling  re- 
quired to  prepare  the  test  results  for  computer  analysis.  In  order 
to  reduce  this  source  of  expense,  an  automated  subjective  data  acquisi- 
tion system  was  developed  at  Georgia  Tech. 

A diagram  of  the  hardware  portion  of  the  subjective  data 
acquisition  system  is  shown  in  Figure  4.1.  The  system  consists  of  six 
"STATIONS,"  each  of  which  has  an  earphone  control  console,  a CRT,  and 
a total  of  16  buttons,  fifteen  "DATA"  buttons  and  one  "CONTROL"  button. 
The  CRT  is  used  for  transmitting  alphanumeric  data  to  the  subjects 
through  the  computer's  D/A  interfaces,  while  the  buttons  are  used  for 
collecting  subject  responses.  The  audio  for  the  system  is  supplied 
by  a Crown  800  analog  tape  recorder  which  is  digitally  controlled.  In 
general,  1 kHz  tones  are  placed  one  track  of  the  analog  tape  to  mark 
the  ends  of  test  sequences.  These  tones  can  be  detected  by  the  compu- 
ter through  a phase  lock  loop  detector,  and  are  used  tc  accurately 
position  the  recorder. 

In  order  to  administer  the  test  and  collect  the  data,  a multi- 
task interpretive  test  control  program,  called  "QUALGOL,"  was  written. 
The  QUALGOL  language  is  summarized  in  Table  4.1,  and  has  all  the 
necessary  elements  (constants,  variables,  labels,  loop  control, 
arithmetics,  etc.)  for  a simple  computer  language.  Using  the  QUALGOL 
language,  an  experimenter  can  easily  "PROGRAM,"  a large  class  of  sub- 
jective tests  on  the  quality  testing  facility.  A program  used  for 
administering  some  of  the  tests  performed  during  this  study  is  given 
in  Figure  4.2. 
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TABLE  4.1 

QUALGOL  LANGUAGE 


CONVENTIONS:  V = VARIABLE 

N = CONSTANT 


VARIABLES:  A-Z 


COMMANDS : 

C CROWN 

C(V) 


C (N) 


D DELAY 

D(N) 

DI  DISPLAY 

D (N) 

E END 

G GET  RESPONSES 

G (V) 


I 


J 


M 


INCREMENT 

I(V) 

JUMP 

J (V, LABEL) 

J (@ , LABEL) 

MESSAGE 
(M  (N,  " . . . " ) 


P PRINT 

P(V) 

S SET 

S(V,N) 

T TRACE 

T 


W 


WAIT 
W (N ) 


RECEIVE  FROM  CROWN 
1 = TONE 

0 = NO  TONE 

SEND  TO  CROWN 

1 FAST  FORWARD 

2 STOP 

3 PLAY 

4 RECORD 

5 REWIND 
0,6,7  NO-OP 


DELAY  N ( . ] SEC)  UNITS 


DISPLAY  MESSAGE  N 


GET  V RESPONSES  DECREMENT 
V TO  ZERO 

INCREMENT  V BY  ONE 

JUMP  TO  LABEL  IF  V=0 
JUMP  TO  LABEL 

DEFINE  MESSAGE 

PRINT  V 

SET  V TO  N 

TRACE  SWITCH 

WAIT  N UNITS 


mi 


(1, LISTENS  TO®S AMPLE) 

M(2,  ) 

M ( 3 , MAKE®CH01CF.  NOW) 

M ( 4 , P1.EASE0MAKE  CHOICE®  NOW) 
M ( b , NOW  STUPID) 

s ()■:,- 100) 

C ( 3)  W (2)  C (0) 
l.T  C ( D)  J ( B , LI ) J ( CJ . L2 ) 
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EN  END 


FIGURE  4.2  AN  EXAMPLE  "QUALOOL"  PROGRAM 
USED  TO  ADMINISTER  THE  COMMUNICABILITY  TESTS 
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4.3 


The  Experimental  Format 


The  communicability  test  format  chosen  for  this  study  was  a 
"Multiple  Digit  Recall”  test  similar  to  that  studied  by  Naghtani  at 
Bell  Labs.  In  this  format,  sequences  of  random  digits  are  first  re- 
corded by  trained  speakers,  and  then  these  utterances  are  played 
through  various  distorting  systems.  The  resulting  sequences  are  then 
played  to  subjects  whose  task  is  to  "RKCAI.L”  the  digits  after  a short 
(~  1 sec.)  wait.  This  test  format  meets  all  the  basic  criteria  set 
forth  in  the  introduction,  since  the  task  does  not  require  a quality 
judgment  on  the  part  of  the  subjects,  the  test  is  simple  to  administer, 
and  the  test  does  not  require  the  communication  system  beinq  tested  to 
be  present. 

The  purpose  of  the  study  reported  here  was  to  study  the  useful- 
ness of  this  test  format  for  evaluating  communication  systems  both 
from  a resolution  and  cost  point  of  view.  It  should  be  noted  that  this 
study  was  a relatively  small  portion  of  the  total  effort,  and  the 
results  obtained  should  bo  considered  preliminary  in  nature.  The 
tests  were  performed  as  follows.  First,  strinqs  of  random  digits 
were  generated  by  the  computer  by  a program  which  re iccted  all  strings 
which  had  double  digits,  had  more  than  two  diqits  in  ascending  or 

descending  sequence,  or  had  more  than  two  digits  in  ascending  or  descen- 
ding alterate  (2-4-u,  etc.)  sequence.  Forty  random  sequences  were 
generated  in  6,7,8, 9,  and  10  digit  lengths.  Second,  the  digit 
strings  wore  read  into  a high  quality  tape  recording  system  by  a 
trained  announcer  from  the  student  broadcast  radio  station.  Tire  digits  were 
read  "as  if  there  wore  a list,"  so  that  no  internal  groupings  were 
imposed  on  the  numbers.  Third,  the  number  strings  were  low  pass 
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filtered  to  3.2  kHz  and  digitized  at  8 kHz  to  12  bits  resolution. 

The  results  wore  stored  on  three  2400  ft.  800  DPI,  9 track  digital 
tapes . 

In  all,  four  sets  of  tests  were  performed.  In  the  first 
"preliminary"  test,  undistorted  data  was  played  to  subjects  to  try 
to  determine  an  appropriate  number  of  digits  for  the  final  tests. 

In  all,  the  subjects  listened  to  200  sequences  consisting  of  40  each  of 
0*7, 8,4,  and  10  diqit  strings.  As  a result  of  this  test,  digit  sequence 
lengths  of  7 and  8 were  chosen. 

In  the  remaining  three  tests,  distortions  were  applied  to  the 
number  strings,  and  these  were  played  to  subjects.  Each  of  these 
three  tests  tested  the  undistorted  strings  against  three  levels  of 
easily  perceivable  distortions.  In  the  first  test,  the  distortions 
were  white  Gaussian  noise  at  a SNR  of  10  db,  8 db  and  5 db.  In  the 
second  test,  the  distortions  were  low  pass  filtering  at  2.4  kHz  cut- 
off frequency,  1.8  kHz  cutoff  frequency,  and  1.2  kHz  cutoff  frequency.  f 
In  the  third  test,  the  distortions  were  ADPCM  waveform  coder  distor- 
tions at  24  kBFs,  lb  kPPs,  and  8 kBPs.  Each  set  of  distortions 
was  played  to  18  subjects  for  a total  of  18x3x2x50=5400  responses. 


4 . 4 The  Data  Analysis 

The  data  analysis  was  done  in  three  stages.  First,  the  data  is 
entered  into  a general  data  base.  Second,  a program  called  "VERIFY' 
examines  the  numbers  for  cases  where  the  number  of  errors  is  greater 
than  three,  or  where  the  errors  meet  a set  of  special  conditions 
(reversals,  dropped  numbers,  etc.).  In  each  case,  the  experimenter 
can  choose  to  omit  the  subject  data.  Third,  a program  called  ’'SCORE" 
allows  the  analysis  of  the  data  base  for  the  means  and  variances 
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necessary  to  use  standard  Student's-t  analysis  and  analysis  of 
variance  techniques,  allows  the  calculation  ot  extensive 

correlation  sets. 

In  all,  three  types  of  scoring  procedures  were  applied  to  the 

• 1 

data.  In  the  first  procedure,  each  response  string  was  scored  to  he 
either  correct  or  not  correct,  and  no  note  was  made  of  the  number  of 
errors  in  the  string.  The  score  statistic  for  this  method  was  the 
percentage  of  incorrect  strings  for  each  subject,  for  each  distortion, 
and  for  each  test. 

In  the  second  scoring  procedure,  each  response  string  was  matched 
to  the  correct  string,  and  the  score  was  taken  to  he  the  total  number 
of  incorrect  digits.  in  this  scoring  procedure,  all  response  strings 
with  missing  digits  or  response  strings  with  the  wrong  number  of 
digits  were  given  a score  of  d. 

The  third  type  of  scoring  was  derived  by  classifying  the  types 
of  digit  errors  in  the  response  strings.  It  was  found  that  the 
predominant  type  of  error  in  the  test  was  a two  digit  error  obtained 
from  interchanging  two  digits.  In  the  third  scoring  procedure,  such 
an  inversion  would  he  considered  to  he  one  error  rather  than  two. 

Rules  were  compiled  to  handle  Inversions  of  more  than  two  numbers  as 
such  cases  appeared  in  the  data. 

For  the  following  discussion,  each  scoti'il  result  will  he  referred 

to  hv  the  designation  X , , where  t is  test  number  (t  = l for  the 

tsdm 

additive  noise  test,  t = 2 for  the  low  pass  filter  test,  and  t “ T 
for  the  AD  PCM  coding  test'*,  s is  the  subject  number  (lb  per  test. 
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1 < s s S,  where  S ■ 18),  d is  the  distortion  level  (four  for  each 
test  * three  distortions  and  "clear"  1 s d s D,  where  D ■ 4) , and 
n is  the  number  of  results  per  subject  (1  * n S N,  where  N ■ 1 for 
the  first  scoring,  and  N = 10  for  the  last  two).  For  each  test, 
analysts  of  variance  was  used  to  determine  the  significance  of  the 
entire  test,  while  the  Student's  t statistic  wa9  used  to  determine 
statistical  significance  between  distortions.  In  each  test,  the 
first  10  responses  were  considered  to  he  "training"  responses,  and 
were  not  included  in  the  results.  The  analysis  of  \ariance  was 
performed  by  calculating  the  F statstic  given  by 


F 


t 


1 ^ - - ,2 

D ■ 1 ^ ^Xtd  ' Xt* 

1 n ' ^ rv  v \2 

D(SN  - 1)  d si  VAtsdm  “ Atd; 


(4.4.1) 


and  testing  for  significance  using  the  appropriate  T distribution, 
while  the  pairwise  significance  was  tested  by  calculating  the  t 
s tatistic 


t 


Xtdl  ' Xtd2 


n , 


(4.4.2) 


and  finding  the  significance  from  the  t distribution. 

4 . 5 The  Experimental  Results 

Table  4.5.1  shows  the  results  of  the  first  scoring  procedure  as 
applied  to  the  three  tests.  A sunmary  of  the  distortions  for  each  test 
is  given  ir  Table  4.5.2.  The  overwhelming  point  is  that  there  are 
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7 

Digit 

Test 

8 Digit  Test 

DISTORTION  (t) 

DISTORTION  (t) 

(1) 

(2) 

(3) 

(4) 

AV. 

(1)  (2) 

(1) 

X 

1.86 

2.00 

2.71 

.53 

(1)  X .37 

(2) 

★ 

X 

.14 

.86 

.56 

(2)  X 

(3) 

* 

V 

.71 

.60 

(3) 

(4) 

irk 

X 

.66 

(4) 

(1) 

X 

1.29  2.29 

2.43 

.55 

O) 

X 

1.36 

1 

(2) 

X 1.00 

1.14 

.66 

(2) 

X 

(3) 

* 

X 

.14 

.71 

(3) 

* 

(4) 

X 

. 73 

(4) 

* 

(1) 

X 

2.00  3.14 

4.43 

.56 

(1) 

X 

1.36 

2 

(2) 

★ 

X 1.14 

2.43 

.67 

(2) 

X 

(3) 

X 

1.29 

.74 

(3) 

* 

(4) 

irk 

** 

X 

.8! 

(4) 

** 

t LEVEL  FOR  SIGNIFICANCE  FOR  NO  REJECTED  DATA 

* ■ Significance  at  .05 
**  ■ Significance  at  .01 


TABLE  4.5.1  RESULTS  OF  UNSCREENED  FIRST  SCORING  TESTS 


TEST 

DISTORTION 

O) 

(2) 

(3) 

(4) 

ADDITIVE 

NOISE 

NONE 

10db  SNR 

8db  SNR 

5db  SNR 

LOW  PASS 

KILTER 

NON  E 

2.4  kHz 

1.8  kHz 

1.2  kHz 

AD  POM 

NONE 

24  KRPS 

16  KRl’S 

8 KBPS 

very  few  significant  results  using  this  scoring  scheme,  llie  major 


problem  here  turns  out  to  be  the  subject  variations.  Some  subjects 
are  so  "bad"  that  they  get  practically  no  strings  correct . Others 
are  so  "good"  that  they  never  miss.  It  was  hence  decided  to  screen 
out  subjects  whose  average  error  rate  was  outside  the  range  .3  < error 
rate  < .7.  Hits  left  10  subjects  on  the  first  test,  on  the  second, 
and  10  on  the  third.  Hie  results  tor  this  scoring  Is  shown  in  Table 
4.5.3.  Clearly,  this  screening  Improves  the  results,  with  a large 
number  of  results  significant  at  the  .01  level.  Hils  same  effect  was 
found  to  hold  for  the  other  two  scoring  procedures. 

Tables  4.5.4  and  4.‘*.S  show  the  results  from  the  second  and  third 
scoring  procedures.  In  these  tests  the  subjects  were  screened  exactly 
as  for  the  f irst' scoring  procedure.  Several  results  are  clear  t rom 
these  two  tests.  First,  both  scoring  procedures  represent  a considerable 
improvement  over  the  first  procedure,  with  the  third  procedure  having 
a slight  edge  In  significance.  Second,  the  noise  tests  seem  to  have 
less  overall  effect  (less  significance)  than  either  the  low  pass 
filter  test,  or  the  ADl’CM  test.  Ihlrd,  the  7 digit  test  seems  to  he 
generally  more  acceptable  than  the  8 digit  test  (higher  significance 
levels  for  the  same  number  of  subjects^. 

/t 

4 . b Cone lus ions 

Ihe  purpose  of  this  study  was  to  acertaiu  the  usefulness  and  cost 
of  the  digit  ioeall  test  as  a conwninicabl li t v test  for  speech  digit irat ion 
systems.  Hie  overall  results  must  be  stated  to  be  that: 

1.  For  the  rather  severe  variations  in  distort  ions  used  in  this 
test.  It  was  easily  possible  to  differentiate  between  systems. 
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8 Digit  Test 


7 Digit  Test 
DISTORTION 


DISTORTION 


NOISE 

TEST 


(1) 

(2) 

(3) 

(4) 

(1) 

(2) 

(3) 

(4) 

(1) 

X 

2.68 

3.35 

4.70 

•50  (1) 

X 

1.49 

2.42 

2.80 

(2) 

'Mrk 

X 

.67 

2.01 

.58  (2) 

X 

.93 

1.30 

(3) 

** 

X 

1.34 

.63  (3) 

* 

X 

.37 

(4) 

kic 

* 

X 

.65  (4) 

** 

X 

.28  (1) 

X 

2.01 

4.70 

6.71 

.51 

(1) 

X 

2.24  4.10 

4.47 

.37  (2) 

* 

X 

2.68 

4.70 

.63 

(2) 

★ 

X 1.86 

2.24 

*49  (3) 

** 

** 

X 

2.02 

.73 

(3) 

** 

* X 

.37 

.58  (4) 

★ * 

irk 

* 

X 

.75 

(4) 

4rk 

★ 

X 

ADPCM 

TEST 


.28 

(1) 

X 

3.58 

5.59 

8.05 

.54 

(1) 

2.61 

4.29 

5.22 

.44 

(2) 

** 

X 

2.01 

4.47 

.68 

(2) 

* 

X 

1.66 

2.61 

.53 

(3) 

** 

* 

X 

2.46 

.77 

(3) 

** 

* 

X 

.93 

.64 

(4) 

irk 

** 

* 

X 

.82 

(4) 

irk 

* 

X 

t LEVEL  FOR  SIGNIFICANCE  FOR  NO  REJECTED  DATA 

* Significance  at  .05 
**  Significance  at  .01 


TABLE  4.5.3  RESULTS  OF  SCREENED  FIRST  SCORING  TESTS 


no 


NOISE 

TEST 


LPC 

TEST 


ADPCM 

TEST 


7 Digit 
DISTORTION 


(1) 

(2) 

(3) 

(4) 

.62 

(1) 

X 

2.47 

5.22 

8.51 

.72 

(2) 

kk 

X 

2.75 

6.04 

.81 

(3) 

** 

kk 

X 

3.30 

"TTT 

(4) 

** 

kk 

** 

X 

.60 

(1) 

X 

4.67 

6.32 

10.44 

.77 

(2) 

kk 

X 

1.65 

5.77 

.83 

(3) 

irk 

k 

X 

4.12 

Tiif 

(4) 

kk 

kk 

kk 

X 

.58 

(1) 

X 

4.39 

6.87 

8.79 

.74 

(2) 

kk 

X 

2.41 

4.39 

• 

00 

(3) 

** 

kk 

X 

1.92 

.90 

(4) 

kk 

kk 

kk 

X 

8 Digit 
DISTORTION 


(1) 

(2) 

(3) 

(4) 

oo 

(1) 

X 

3.84 

5.11 

6.14 

.99 

(2) 

kk 

X 

1.28 

2.30 

1.04 

(3) 

kk 

X 

1.02 

1.08 

(4) 

irk 

kk 

X 

.82 

(1) 

X 

5.37 

7.19 

8.55 

1.03 

(2) 

kk 

X 

1.79 

3.58 

1.10 

(3) 

kk 

k 

X 

1.79 

1.17 

(4) 

•kk 

kk 

* 

X 

.83 

(1) 

X 

4.60 

6.65 

8.18 

1.01 

(2) 

** 

X 

2.05 

3.58 

1.09 

(3) 

kk 

** 

X 

1.53 

1.15 

(4) 

kk 

kk 

* 

X 

t LEVEL  FOR  SIGNIFICANCE  FOR  NO  REJECTED  DATA 

* Significance  at  .05 
**  Significance  at  .01 


TABLE  4.5.4  RESULTS  OF  THE  SCREENED  TESTS  USING  THE 
SECOND  SCORING  METHOD 


1J1 


7 Digit 
DISTORTION 


(1) 

(2) 

(3) 

(4) 

.53 

(1) 

X 

3.85 

7.42 

10.44 

.67 

(2) 

irk 

X 

3.57 

6.59 

.80 

(3) 

irk 

** 

X 

3.02 

.91 

(4) 

kk 

kk 

** 

X 

.51 

(1) 

X 

3.57 

7.69 

9.89 

.64 

(2) 

** 

X 

4.12 

6.32 

.79 

(3) 

** 

irk 

X 

2.20 

.87 

(4) 

*-k 

irk 

kk 

X 

.52 

(1) 

X 

4.94 

7.69 

9.61 

> 

(2) 

** 

X 

2.75 

4.67 

.80 

(3) 

X 

1.96 

.87 

(4) 

■kk 

★★ 

k 

X 

8 Digit 
DISTORTION 


(1) 

(2) 

(3) 

(4) 

.63 

(1) 

X 

4.86 

7.67 

9.46 

.82 

(2) 

kk 

X 

2.81 

4.60 

.93 

(3) 

kk 

** 

X 

1.79 

1.00 

(4) 

** 

** 

★ 

X 

.61 

(1) 

X 

5.63 

8.44 

10.74 

.83 

(2) 

kk 

X 

2.81 

5.11 

.94 

(3) 

kk 

** 

X 

2.30 

1.03 

(4) 

kk 

*+ 

kk 

X 

.60 

(1) 

X 

5.63 

8.69 

10.74 

.82 

(2) 

** 

X 

3.07 

5.11 

.94 

(3) 

kk 

X 

2.05 

1.02 

(4) 

kk 

** 

** 

X 

t LEVEL  FOR  SIGNIFICANCE  FOR  NO  REJECTED  DATA 

* “ Significance  at  .05 
**  ■ Significance  at  .01 

TABLE  4.5.5  RESULTS  OF  SCREENED  TESTS  USING  THE  THIRD  SCORING  METHOD 
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2.  The  cost  of  this  test  is  quite  high  when  compared  to  other 
speech  quality  and  speech  intelligibility  tests. 

3.  There  is  great  subject  variability,  indicating  that  results 
might  be  Improved  substantially  by  using  a trained,  well 
documented  crew  of  listeners. 

4.  For  this  particular  group  of  subjects,  7 digits  seemed  about 
right.  Clearly,  however,  for  some  7 was  too  many,  while  for 
others,  8 was  too  few. 

5.  The  test  is  a very  unpleasant  test  in  which  to  participate. 

6.  The  ability  of  digit  recall  tests  to  differentiate  between 
systems  which  are  closely  matched  for  performance  is  limited, 
and  would  require  considerable  cost. 

In  summary,  it  may  be  said  that,  even  though  this  type  of 
communicability  test  can  be  argued  to  be  more  appropriate  than 
subjective  preference  testing,  and  even  though  it  is  possible,  as  shown 
in  this  study,  to  differentiate  among  distorting  systems,  still  the 
excessive  cost  of  communicability  testing  required  to  obtain  the 
desired  significance  levels  makes  these  tests  unattractive. 
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SPEECH  ACCEPTABILITY  EVALUATION  AT  DYNASTAT: 
THE  DIAGNOSTIC  ACCEPTABILITY  MEASURE  (DAM) 

BACKGROUND 


It  is  a matter  of  common  observation  that  user  'ccep- 
tance  of  voice  communications  equipment  depends  on  factors 
other  than  speech  intelligibility.  Although  a high  degree  of 
intelligibility  is  generally  a necessary  condition,  it  is  not 
a sufficient  condition  of  user  acceptance.  But  until  recently, 
no  generally  satisfactory  methods  of  evaluating  the  overall 
acceptability  or  "quality"  of  processed  or  transmitted  speech 
has  been  available.  Among  the  previously  available  methods, 
some  are  applicable  only  for  certain  types  of  speech  signal 
degradation.  Others  are  of  limited  reliability.  Virtually 
none  permits  reliable  system  evaluation  in  absolute  terms  for 
the  diversity  of  processing  techniques  and  transmissions 
encountered  in  modem  dip,ital  voice  communications. 

Under  contract  with  the  Defense  Communications  Agency, 
Dynastat  recently  undertook  to  fill  the  need  that  existed 
in  the  area  of  acceptability  evaluation.  The  results  of  this 
effort  included  the  Paired  Acceptability  Rating  Method  (PARM) 
and  the  0,uality  Acceptance  Rating  Test  (QUART),  both  of  which 
provide  improved  reliability  of  measurement  on  an  absolute  scale 
of  acceptability.  Having  met  the  interim  needs  of  the  Narrow 
Band  Voice  Consortium,  they  al.so  served  as  valuable  research 
tools  to  clarify  a number  of  crucial  methodological  issues  and 
to  indicate  possible  means  of  further  refining  the  technology 
of  speech  evaluation. 

Drawing  on  insights  gained  in  the  course  of  its  con- 
tractual activities  with  PARM  and  QUART,  Dynastat  continued  under 
: s own  auspices  to  further  advance  the  technology  of  communication 


system  evaluation  from  the  standpoint  of  overall  speech  accept- 
ability, These  efforts  culminated  in  the  Diagnostic  Accept- 
ability Measure  (DAM) . 

THE  DIAGNOSTIC  ACCEPTABILITY  MEASURE 

The  Diagnostic  Acceptability  Measure  combines  direct 
(isometric)  and  indirect  (parametric)  approaches  to  accept- 
ability evaluation  by  means  of  twenty-item  system  rating  form.* 
Ten  of  the  items  on  the  form  are  concerned  with  the  accept- 
ability-related perceptual  qualities  of  the  speech  signal , 
itself.  Seven  items  are  concerned  with  the  perceptual  qualities 
of  the  background . Three  items  are  concerned  with  the  perceived 
intelligibility,  pleasantness . and  overall  acceptability  of  the 
total  effect.  The  descriptors  used  to  define  the  various  percep- 
tual qualities  are  the  end  products  of  an  extensive  program  of 
research  concerned  with  the  nature  of  these  qualities  and  with 
the  development  of  a precise  vocabulary  for  characterizing  them. 

The  results  of  further  research  have  indicated  that 
listener's  perceptions  of  modern  digital  voice  communication 
systems  and  diverse  forms  of  laboratory  degradation  can  be 
exhaustively  characterized  in  terms  of  six  elementary  perceived 


* The  isometric  approach  requires  the  listener  to  provide  a 
direct  subjective  assessment  of  the  acceptability  of  a sample 
speech  transmission.  The  parametric  approach  requires  the 
listener  to  evaluate  the  sample  transmission  with  respect  to 
various  perceived  characteristic j or  qualities  (e.g.,  noisiness) 
independently  of  his  individual  effective  reactions  to  these 
qualities.  Hence,  the  parametric  approach  tends  to  minimize 
the  sampling  error  associated  with  individual  differences  in 
"taste."  The  individual  who  does  not  personally  place  a high 
valuation  on  a particular  speech  quality  may  nevertheless 
provide  information  of  vise  in  predicting  the  typical  indi- 
vidual's acceptance  of  speech  characterized  by  a given  degree 
of  that  perceptual  quality. 
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qualities  of  the  signal  and  three  perceived  qualities  of  the 
background.  Measures  of  these  elementary  qualities  are 
obtained  by  various  combinations  of  rating  scale  data. 

In  accordance  with  the  above  research  results,  DAM 
rating  data  are  presently  analyzed  to  yield  system  diagnoses 
with  respect  to  the  nine  perceptual  qualities  indicated  in 
Table  1.  The  contribution  of  each  of  these  qualities  to  the 
listener's  acceptance  reaction  has  been  determined,  so  that  each 
diagnostic  score  can  be  expressed  in  terms  of  the  level  of 
acceptability  a system  would  be  accorded  if  it  were  deficient 
with  respect  only  to  the  single  perceptual  quality  involved. 
Expressed  in  this  way,  the  pattern  of  diagnostic  scores  reflects 
the  relative  contribution  of  each  perceptual  quality  to  the 
acceptability  of  the  system,  and  permits  the  system  developer 
to  concentrate  on  the  perceived  characteristics  of  his  system 
which  are  most  detrimental  to  its  acceptance. 

The  application  of  multiple,  nonlinear  regression  tech- 
niques to  a set  of  diagnostic  scores  permits  the  derivation  of 
supplementary,  parametric  estimates  of  intelligibility , pleasant- 
ness , and  acceptability , which  can  be  combined  with  direct,  or 
isometric  rating  data  to  yield  highly  reliable  and  valid  estimates 
of  all  three  of  these  properties.  For  practical  purposes  of 
system  evaluation,  however,  parametric  predictions  are  presently 
provided  only  for  acceptabi  lity. 

To  permit  comparisons  with  the  results  of  tests  pre- 
viously conducted  with  PAFM,  DAM  acceptability  results  are  trans- 
formed to  their  FARM  equivalents.  A transformation  of  judged 
intelligibility  results  permits  estimates  of  equivalent  DRT  total 


scores . 


Rigorous  procedures  for  monitoring  and  screening  of 
listening  crew  members  contribute  significantly  to  the  reli- 
ability of  DAM  results. 


TARLE  I.  SYSTEM  CHARACTERISTICS  EVALUATED  BY  DAM 
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Speaker  differences  are  relatively  small  with  DAM, 
particularly  within  sexes.  Depending  on  the  purposes  of  the 
investigator,  however,  the  use  of  more  then  one  speaker  may 
be  appropriate. 

The  speech  materials  used  for  purposes  of  DAM  evalua- 
tions consist  oi  12  phonemicallv  controlled  sentences,  spoken 
by  each  of  the  <;«. si  red  number  of  speakers.  Aoproximately  one 
minute  total  running  time  is  required  for  each  speaker. 

Figure  1 shows  the  standard  format  in  which  DAM  results 
are  reported.  Presented  first  are  the  basic  diagnostic  scores 
and  their  standard  errors.  Kach  diagnostic  score  represents 
one  estimate  of  the  acceptability  rating  the  system  being  eval- 
uated would  receive  if  it  were  deficient  only  with  respect  to 
the  corresponding  perceptual  quality.  Summary  scores,  repre- 
senting the  combined  effects  of  signal  qualities  and  background 
qualities,  respectively  are  also  shown.  Dross  scores  relating, 
to  acceptability,  judged  pleasantness  and  judged  intelligibility 
are  shown  in  the  bottom  half  of  the  figure. 

Isometric  scores  are  based  only  on  direct  ratings  of 
the  respective  characteristics. 

Parametric  scores  are  based  on  predictions  of  accept- 
ability from  combined  diagnostic  scores  for  signal  quality  and 
combined  diagnostic  scores  for  background  quality. 

Composite  scores  for  acceptability  are  based  on 
isometric  scores  for  acceot.abi lity , parametric  scores  for 
acceptability,  and  on  composite  ratings  of  pleasantness  and 
intelligibility . 
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DIAGNOSTIC  ACCEPTABILITY  XEASURE  RESULTS 
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Equivalent  PARM  scores  and  Equivalent  DRT  scores  are 
currently  obtained  by  simple  linear  regression  techniques 
applied  to  composite  acceptability  scores  and  isometric  intel- 
ligibility ratings,  respectively . However,  it  is  expected  that 
more  precise  estimates  of  DRT  scores  will  be  obtained  in  the 
future  through  the  application  of  multiple  prediction  techniques 
to  the  DAM  diagnostic  scores.  Fig.  2 shows  the  correlation 
between  DAM  acceptability  scores  (composite)  and  PARM  test 
results  for  a sample  of  modern  digital  voice  communication 
systems.  Fig.  3 shows  the  correlation  between  isometric  intel- 
ligibility ratings  and  DRT  total  scores. 

DAM  evaluations  have  been  performed  on  an  extremely 
broad  sample  of  state-of-the-art  narrow  band  and  hroad  band 
digital  voice  communication  systems.  Norms  for  various  condi- 
tions of  speech/noise  ratio,  band  restriction,  and  other  simple 
forms  of  signal  degradation  have  also  been  established.  These 
normative  data  provide  Dynastat  with  truly  unique  capability 
for  detailed,  useful  interpretation  of  DAM  for  future  experi- 
mental systems  or  conditions.  Research,  contemplated  and  in 
progress,  will  serve  to  expand  DAM's  range  of  application  and 
provide  norms  for  yet  to-be-encountered  processing  techniques 
and  transmission  conditions. 

For  further  information  regarding  the  technical  aspects 
of  the  DAM  and  on  the  evaluation  services  Dynastat  offers  with 
it  please  contact: 


Dr.  William  D.  Voiers 
Dynastat,  Inc. 

2704  Rio  Grande,  Suite  4 
Austin,  Texas  78705 


Phone:  (512)  476-4797 


Administrative  or  contractual  information  relating 
to  Dynastat's  services  with  the  DAM  may  be  obtained  from 
Mr.  Ira  L.  Panzer  at  the  same  address  and  phone  number. 
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APPENDIX  B 


DERIVATION  OF  THE  PROBABILITY  DENSITY  FUNCTION 
FOR  WE  STUDENTIZED  RANGE  STATISTIC 
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DERIVATION  OF  THE  PROBABILITY  DENSITY  FUNCTION 
FOR  THE  STUDENTIZED  RANGE  STATISTIC 

From  Figure  3.1<  let 
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Then  the  joint  probability  density  of  n and  8 is 


f fl(x,y)  =»  - . F fl(x,y! 
a, 8 3x3y  a,6 


Thus 


|k(k-l) (f  <x)-Fx(y)k"2f  (x)f  (y) 
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APPENDIX  C 

MINICOMPUTER  BASED 
DIGITAL  SIGNAL  PROCESSING  LABORATORY 
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A minicomputer-based  Digital  Signal  Processing  Laboratory  has 
been  under  construction  at  Georgia  Tech  since  August  1973.  It  is  now 
an  extensive  hardware -software  complex  dedicated  to  research  and 
instruction  in  many  digital  signal  processing  and  minicomputer  related 
areas.  This  appendix  describes  briefly  the  elements  of  this  system. 

The  system  is  based  upon  three  minicomputers,  an  Eclipse  5230 
with  64K  of  16-bit  memory,  and  a NOVA  830  with  64K  of  16-bit  memory  in 
the  Research  Lab,  and  a nova  820  with  32K  of  16-bit  memory  in  the 
Student  Lab.  The  uses  of  these  computers  are  numerous  and  diverse. 
Hence,  the  various  hardware  and  software  components  of  the  system  will 
be  presented  separately. 

THE  RESEARCH  COMPUTERS 

A block  diagram  of  the  basic  research  computer  facility  is 
shown  in  Figure  1.  Included  in  this  section  are  only  those  peripherals 
which  are  used  by  many  applications  A full  set  of  peripherals  are 
listed  in  Table  1. 

The  computational  power  for  the  system  is  supplied  by  two 
groups  of  the  Eclipse  5230,  which  has  64K  of  16-bit  semiconductor 
memory  (+  CACHE),  a floating-point  processor,  hardware  multiply 
divided,  a memory  management  unit,  and  writable  control  storage 
(for  microprogramming  the  processor),  and  by  one  ground  of  the  NOVA 
830,  which  has  a floating-point  processor,  hardware  multiply-divide, 
a memory  management  unit,  and  64K  of  1 psec  16-bit  memory.  Bulk 
storage  is  supplied  by  three  discs.  The  main  disc  is  a 192  M Byte 
moving  head  drive  shared  gy  the  Eclipse  and  the  NOVA  830.  Each  of  the 
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TABLE  1 


I/O  DEVICES  ON  THE  NOVA  830  I/O  BUSS 


DATA  GENERAL  INTERFACES 

Diablo  33  disc  controller  (5  M bytes) 
Diablo  44  disc  controller  (10  M bytes) 
NOVA  cassette  controller 
Real  time  clock 

Floating-point  arithmetic  unit 

Memory  management 

Data  General  mag  tape  controller 

RS-232  interface  at  9600  baud 

RS-232  interface  at  1200  baud 

Inter-processor  buss 

Comtal  video  system  interface 


INTERFACES  CONSTRUCTED  AT  GEORGIA  TECH 

Programmable  sampling  clock 
RS-232  variable  baud  clock 
Joy  stick  interface 
Light  pen  interface 
Button  box  interface 
RS-232  interface  (2) 

16  bit  double  buffered  D-to-A 

10  bit  single  buffered  D-to-A  (4) 

A-to-D/sample  and  hold/analog  multiplexer 

Ampex  analog  tape  deck  control 

Revox  analog  tape  deck  control 

Crown  analog  tape  deck  control 

Kennedy  7-track  digital  tape  interface 

Line  printer  interface 

Card  reader  interface 

Paper  tape  reader  interface 

Programmable  stack  (256  words) 

Quality  test  interface 
Universal  card  tester  interface 
Time-of-day  and  date  clock 
Control  card  testing  interface 
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other  two  disc  units  is  of  the  moving  head  type,  and  each  has  one 
fixed  and  one  removable  pack.  The  Diablo  model  44  disc  has  10  M byte 
capacity,  and  is  used  by  the  Eclipse  alone.  The  Diablo  model  33  has 
5 M byte  capacity,  and  is  shared  by  the  NOVA  830  and  the  NOVA  820 
(instructional)  computers.  Additional  bulk  storage  is  supplied  by 
two  tape  units,  a NOVA  cassette  tape  and  a 7-track  digital  unit  (a 
9-track  unit  is  on  order  from  Data  General).  The  cassette  is  standard 
Data  General  peripheral,  while  the  7-track  was  interfaced  at  Georgia 
Tech . 

Additional  general  purpose  devices  include  a card  reader,  a 
line  printer,  a paper  tape  reader,  and  a paper  tape  punch.  These 
units  were  all  interfaced  at  Georgia  Tech. 

The  foreground  of  the  NOVA  830  is  used  a a general  peripheral 
control  ground  for  sharing  the  scarce  :•<  ipherals.  Most  all  of  the 
general  purpose  and  special  purpose  peripherals  in  the  system  are 
interfaced  to  the  NOVA  830  (see  Table  1),  and  this  ground  accesses 
all  the  other  grounds  on  the  other  computers  in  the  system  to  access 
these  peripherals. 

THE  GRAPHICS  SUBSYSTEM 

One  of  the  major  design  criteria  for  this  system  was  a high 
level  of  high  speed  graphical  interaction  between  the  user  and  the 
computer.  Figure  2 shows  the  hardware  associated  with  the  graphical 
subsystem. 

This  system  supports  many  types  of  graphical  interaction. 

First,  it  supports  line  printer  graphics  both  in  the  axis-graph  mode 
and  in  the  X-Y-Z  mode  for  picture  reproduction.  Second,  the  Tektronix 
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4010  graphical  unit  gives  storage  type  \ector  graphics  at  9600  baud 
and  cross  hair  feedback  interaction.  Third,  refresh  graphics  is 
supplied  by  driving  X-Y-Z  CRT's  directly  from  3 of  the  D-to-A's.  A 
light  pen  (built  at  Georgia  Tech),  along  with  two  joy  sticks,  3 button 
boxes,  and  two  potentiometers  give  interaction  in  the  refresh  mode. 
Fourth,  a CALCOMP  incremental  plotter  (interfaced  at  Georgia  Tech), 
gives  hard  copy  capability  in  the  vector  and  character  modes.  Last, 
a Comtal  video  processor  gives  X-Y-Z  CRT  support,  on  a 512x512  display 
with  eight  bits  resolution. 

THE  AUDIO  SUBSYSTEM 

A diagram  of  the  audio  subsystem  is  given  in  Figure  3.  This 
subsystem  was  constructed  as  an  aid  to  interactive  speech  processing. 

The  whole  system  is  centered  on  a patch  bay  located  with  the 
NOVA  830.  This  patch  bay  gives  the  user  great  flexibility  in  intercon- 
necting the  individual  audio  components. 

Data  acquisition  is  handled  through  a 12-bit  (10  nsec)  A-to-D 
with  an  8-channel  analog  multiplexer  on  its  input.  Data  playback  is 
handled  by  a 16-bit  double  buffered  D-to-A.  The  sampling  rate  on 
these  two  units  is  controlled  by  a programmable  clock.  Four  additional 
channels  of  8 bit  D-to-A's  form  single  buffered  analog  outputs.  The 
entire  data  acquisition  and  playback  system  was  built  at  Georgia  Tech. 

Four  analog  tape  drives  are  available  for  use  with  the  system. 
Two  of  these,  a Crown  800  and  a Revox  tape  drive,  are  interfaced  so 
they  may  be  controlled  by  the  computer.  The  Crown  interface  allows 
the  positioning  of  the  tape  to  any  desired  position  (within  tape 
stretch) . Either  of  the  two  Ampex  drives  may  be  used  under  computer 
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control  in  place  of  the  Re vox . . 

Four  variable  filters  and  three  audio  amplifiers  are  also 
available  Cor  use  with  this  system. 

SPEECH  QUALITY  TEST  SUBSYSTEM 

The  speech  quality  tost  subsystem  depicted  in  Figure  4 is 
designed  Cor  the  automated  control  of  subjective  quality  tests.  The 
subsystem  consists  of  six  stations,  located  in  a separate  speech  quality 
laboratory  and  controlled  by  the  NOVA  830  computer.  Each  of  the 
stations  lias  a CRT,  15  response  buttons,  a "read"  button,  oar  phones, 
and  a volume  control  for  each  ear.  The  computer  interface  can  read  the 
buttons  at  any  station,  clear  and  set.  the  ready  flip  flop,  and,  using  c 
software!  character  generator,  display  messages  to  the  subjects  on  the 
CRT's. 

This  quality  system  has  several  distinct  advantages  over  a 
non-automated  system.  First,  it  eliminates  much  of  the  hand  work  or. 
data  reduction.  Second,  it  allows  on-line  statistical  analysis;.  Last, 
it  allows  the  subjective  test  to  reconfigure  itself  based  on  the  subject 
responses . 

THF.  OPTICAL  DATA  PROCESS  1 NG  SUBSYSTEM 

A diagram  of  the  optical  data  processing  facility  is  given  in 
Fiqvire  5.  This  subsystem  has  three  component!;.  The  first  component, 
the  "picture  acquisition"  component , consists  of  a Micro  NOVA  Micro- 
computer (in  Ur.  William  Rhodes'  laboratory)  which  controls  an 
electro-mechanical  scanner.  This  equipment  is  still  under  development. 
Second,  the  Micro  NOVA  also  controls  an  optical  data  digitizer  for 
picture  acquisition.  The  third  component  in  this,  system  i s tin 
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FIGURE  4 

The  Speech  Quality  Testing  Subsystem 
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FIGURE  5 

The  Optical  Data  Processing  Subsystem 
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"picture  playback"  facility.  This  facility  consists  of  3 D-to-A’s 
and  two  CRT  scopes.  One  CRT  is  of  the  storage  type,  and  allows  quick 
viewing  of  the  pictures  being  displayed.  The  second  CRT  is  equipped 
with  a scope  camera.  The  interchangeable  backs  on  this  camera  allow 
the  production  of  either  Polaroid  or  120  roll  film  pictures.  The 
Comtal  video  system  can  also  be  used  to  produce  pictures. 

THE  COMPUTER  NETWORK  SUBSYSTEM 

A "star"  computer  network  is  currently  under  development  in  the 
digital  signal  processing  laboratory.  The  basic  hardware  for  this 
system  is  shown  in  Figure  6.  The  NOVA  830  communicates  with  the 
Eclipse  through  an  interprocessor  buss  (IPD) , and  with  several  other 
computers  through  high  speed,  variable  baud  rate,  RS-232  standard, 
asynchronous,  serial  interfaces.  These  RS-232  interfaces  were  designed 
and  built  at  Georgia  Tech,  and  are  capable  of  speeds  up  to  152K  baud. 

The  hardware  for  this  system  exists  and  is  tested.  The  software 
is  currently  under  development. 

THE  UNIVERSAL  CARD  TESTER  AND  THE  HARDWARE  PHILOSOPHY 

One  of  the  most  important  subsystems  of  the  digital  signal  pro- 
cessing laboratory  is  the  universal  card  tester.  To  understand  how  this 
is  used,  it  is  important  to  understand  the  hardware  philosophy  of  the 
laboratory.  Most  of  the  hardware  constructed  in  the  laboratory  is 
constructed  in  prebuilt  chassis.  Each  chassis  contains  40  56-pin 
connectors.  The  computer  I/O  buss  enters  each  chassis  and  is  Eplit 
into  3 sub-busses,  called  the  "data  buss,"  the  "control  buss,"  and  the 
"address  buss."  If  this  is  not  the  final  chassis  on  the  daisy  chain, 
the  busses  are  regrouped,  and  extended  to  the  next  chassis. 
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FIGURE  7 

Ihe  Universal  Card  Tester 
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The  hardware  interfaces  constructed  in  the  chassis  are  mostly 
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constructed  from  pre-designed  printed  circuit  boards.  A list  of  the 
PC  cards  available  for  interface  construction  is  given  in  Table  2. 

Most  interfaces  consist  of  using  some  set  of  "standard"  cards  with, 
perhaps,  some  additional  construction. 

The  main  problem  in  hardware  construction,  tnerefore,  is  in 
building  and  testing  the  "standard"  cards,  often  with  semi-skilled 
labor.  This  is  the  purpose  of  the  universal  card  tester. 

A diagram  of  the  universal  card  tester  is  given  in  Figure  7. 

The  tester  has  a switch  panel,  a patch  panel,  and  a single  "standard" 
56-pin  connector  as  an  "input,"  and  "output,"  or  as  an  "external." 

Each  pin  has  a parallel  connection  to  the  patch  panel  for  external 
connection.  The  computer  can  read  or  write  individual  bits  to  any  pin 
position.  Hence,  any  desired  input /output  sequence  can  be  presented 
to  a card  being  tested,  and  the  results  can  be  read  back  by  the 
compute r . 

The  software  package  associated  with  the  card  tester  allows  the 
user  to  test  end  debug  any  of  the  standard  cards.  In  addition,  a 
special  card  allows  the  testing  of  individual  integrated  circuit  chips. 

THE  BASIC  INSTRUCTIONAL  COMPUTER  (NOVA  820) 

The  NOVA  820  computer  and  its  associated  peripherals  forms  a 
conputer  and  signal  processing  facility  dedicated  to  student  activities. 
These  activities  mainly  include  several  laboratories  associated  with 
course  and  student  project  work.  The  hardware  is  configured  so  as  to 
allow  maximum  utilization  of  the  software  developed  in  the  research 
laboratory. 
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TABLE  2 


STANDARD  PC  CARDS  USED  IN  THE  MODULAR  CONSTRUCTION  SYSTEM 


CARD  NAME 


PURPOSE 


Single  Address 
Dual  Address 
Control 
Input  buffer 
Output  buffer 
DMA 

Counter 

Memory 

RS-232  (1) 

RS-232  (2) 

M6800  CPU 
M6800  Memory  (!) 
M6830  Memory  (2) 

M6800  Buffer 
M6800  Control 


Address  decode 

Address  decode 

Interrupt  control 

16  bit  input  buffer 

16  bit  output  buffer 

Direct  memory  access  control 

16  bit  up/down  counter 

256x256  bit  high  speed 
memory  (43  msec) 

High  speed  serial  converter 

Medium  speed  serial  converter 

Micro-processor  CPU 

Micro-processor  memory  (4K  Ram) 

Micro-processor  memory  (4K  RAM, 
4K  ERROR) 

Micro-processor  buffer 

Micro-processor  interrupt 
control 


I 

i 


Kluge 


General  purpose 


Figure  8 shows  the  basic  NOVA  820  computer  system  and  Table  3 
gives  a list  of  peripherals.  The  CPU  has  32K  of  800  nsec  memory  and  a 
hardware  multiply-divide  unit.  Bulk  storage  is  formed  by  two  moving- 
head  disc  drives  totaling  5 M bytes  of  storage.  These  discs  are  shared 
with  the  NOVA  830,  and  communication  between  the  processors  is 
maintained  on  a high  speed  RS-232  port. 

Many  of  the  peripherals  have  been  constructed  so  as  to  be 
identical,  from  a computer  command  viewpoint,  to  those  on  the  research 
facility.  Hence,  the  D-to-A's,  the  double  buffered  D-to-A's,  the 
A-to-D,  the  A-to-D  8-channel  analog  multiplexer,  and  the  programmable 
clock  all  utilize  the  same  commands  as  their  counterparts  on  the  NOVA 
830.  These  peripherals  give  the  NOVA  820  a similar  audio  and  refresh 
graphics  capability  to  the  NOVA  830. 

Interactive  graphics  on  the  NOVA  820  is  handled  by  a M6800 
control  plasma  terminal  designed  to  look  like  a Tektronixs  4010. 

Hence,  all  the  graphics  packages  developed  for  the  NOVA  830  will  run 
on  the  NOVA  820. 

THE  MICRO-COMPUTER  SUBSYSTEM  (M6800) 

One  of  the  most  important  developments  in  modern  control  tech- 
nology has  been  the  development  of  the  micro-processor.  The  micro- 
processor subsystem  of  the  student  (NOVA  820)  laboratory  was  developed 
with  three  purposes: 

1.  To  develop  a rdcro-processor  board  set  for  use  as 
a general  interfacing  tool. 

2.  To  develop  a hardware  interface  between  NOVA  820 
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FIGURE  8 

The  NOVA  820  Basic  System 


TABLE  3 


I/O  DEVICES  ON  THE  NOVA  820  I/O 


DATA  GENERAL  INTERFACES 

Diablo  33  disc  controller 
RS-232  interface  at  1200  baud 
Inter-processor  buss 

INTERFACES  CONSTRUCTED  AT  GEORGIA  TECH 

Programmable  sampling  clock 

Light  pen  interface 

16  bit  double  buffered  D-to-A 

10  bit  single  buffered  D-to-A  (4) 

A-to-D/sample  and  hold/analog  multiplexer 

Line  printer/M6800  input  interface 

M6800  Micro-computer  CPU 

M6800  4K  memory  module  (2) 

M6800  control  and  communication  interface 
Plasma  display  interface 


I 

| 
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and  a micro -processor  and  to  develop  software  for 
the  NOVA  820  which  allow  simple,  interactive 
software  development  for  the  microprocessor. 

2.  To  develop  software  for  the  micro-processor  to  do 
the  graphics  and  character  generation  tasks  related 
to  the  plasma  scope. 

All  three  of  these  purposes  have  been  accomplished.  Future 
goals  for  the  subsystem  include  the  addition  of  another  8 bit  micro- 
processor board  *8080A)  and  the  development  of  a system  based  on  the 
new  Data  General  16  bit  micro-processor. 

A diagram  of  the  hardware  associated  with  the  micro-processor 
is  shown  in  Figure  9.  Through  a general  interface  to  the  micro- 
processor's buss,  the  NOVA  820  can  completely  control  the  micro- 
processor and  load  and  examine  the  micro-processor  memory.  Through  a 
standard  interrupt  interface,  the  NOVA  820  can  communicate  with  the 
micro-processor  as  it  would  any  other  peripheral.  This  environment 
allows  great  flexibility  in  the  use  of  the  micro-processor. 

The  micro-processor  itself  has  8K  of  8 bit,  1 msec  memory,  an 
interrupt  I/O  port,  and  a 16  bit  I/O  buffer.  Expansion  of  the  hardware 
and  software  for  this  subsystem  is  continuing. 


PROGRAM  NAME 

LANGUAGE: 

CATEGORY: 


ACONT 

FORT 

GENERAL 


SWITCH  TYPE 


PURPOSE 


I G INPUT  STARTING  ADDRESS  FROM  TTY 

R G DATA  IS  REAL ( ASSUME  INTEGER  OTHERWISE) 

0 L OUTPUT  (CONTIGUOUS)  FI  IF  — MUST  COME  FIRST 


PURPOSE 

TO  CONCATENATE  A SET  OF  CONTIOIOUS  FIL  ES  INTO  A SINGLE  OUT  IT 


PROGRAM  NAME  ACONTS 

LANGUAGE  FORT 

CATEGORY  GENERAL 


SWITCH  TYPE  PURPOSE 

R G DATA  IS  REAL— ASSUMED  INTEGER  OTHERWISE. 

0 1.  CONTIGUOUS  OUTPUT  FILE 


PURPOSE 

TO  CONCATENATE  A SET  OF  CONTIOIOUS  INPUT  FILES  OF  INTEGRAL  N 
OF  CYLINDERS  INTO  A SINCLE  OUTPUT  FILE 


PROGRAM  NAME  ADPCM 

LANGUAGE  FORT 

CATEGORY  SPEECH 


SWITCH  TYPE  PURPOSE 

P L PITCH  FILE 

I L INPUT  FILE  (SPEECH) 

0 L OUTPUT  FILE  (SPEECH) 

C L FEEDBACK  COEFFIENT  FILE 

X L QUANT  I ZED  ERROR  OUTPUT  FT  I T 

E L ERROR  OUTPUT  FILE 

M L MULTIPLIER  OUTPUT  FILE 

D l DATA  FILE 

t.  L LISTING  FILE 


i 
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PURPOSE 


TU  SIMULA  re  GENERAL  ADPCM  SYSTEMS  SYSTEM  IS  CONFIGURED  BY  D 
AND  INPUT /OUTPUT  FILES (EG.  IF  A /P  FILE  IS  PRESENT, A PITCH  S 
ERROR  CORRECTION  IS  DONE) 


PROGRAM  NAME: 

CPITCH 

LANGUAGE: 

FORT 

CATEGORY: 

SPEECH 

SWITCH  TYPE 

PURPOSE 

0 L 

OUTPUT  PITCH  FILE 

hijrfhse: 

TO  CREATE  A CONSTANT  PITCH  CONTOUR 

PROGRAM  1 

NAME 

DECK 

LANGUAGE 

FORT 

CATEGORY 

GENERAL 

S'.J  I I OH 

TYPE 

PURPOSE 

P 

G 

PLAY 

w 

G 

RECORD 

• ' 

G 

FAST  FORWARD 

i; 

G 

FAST  BACKWARD 

C 

G 

USE  CROWN  INSTEAD  OF  AMPEX 

PURPOSE 

ANALOGUE 

TAPE 

DRIVE  CONTROL  PROGRAM 

PROGRAM  NAME- 
I ANGUAGE 
DATE 
AU THOR 
CAT  EGGHY 


DCAADIN 
FORTRAN 
6/  9/77 
TP  BARNWELL 
GENERAL 


PURPOSE 

THIS  IS  AN  INTERACTIVE  PROGRAM  FOR  TRANSFERING  DATA  FROM  IBM 
SPEECH  DATA  TAPES,  ORGINATINC  AT  DC  A,  TO  DAT  A GENERAL  CONI  IG 


FILES 

THE  PROGRAM  IS  INTERACTIVE  AND  SELF  EXPLANITORY 


PROGRAM  NAME: 

LANGUAGE: 

DATE: 

AUTHOR: 

CATEGORY: 


DCAAV 
FORTRAN 
6/  9/77 
T.  P.  BARNWELL 
GENERAL 


PURPOSE 

THIS  PROGRAM  COMPUTES  THE  AVERAGE  OF  MANY  OBJECTIVE 
MEASURES  COMPUTED  BY  OBJETIVE  AND  0BJ2.  ITS  PURPOSE  IS  TO 
GET  AN  OVERALL  MEASURE  FROM  MANY  SINGLE  WINDOWED  ERRORS 


PROGRAM  NAME: 

LANGUAGE: 

DATE: 

AUTHOR: 

CATEGORY: 


DCATAPEIN 
FORTRAN 
6/  9/77 
T.P.  BARNWELL 
GENERAL 


PURPOSE 

THIS  IS  AN  INTERACTIVE  PROGRAM  TO  TRANSFER  AN  IBM  9 TRACK 
TAPE  CODED  IN  EBCDIC  TO  AN  ASCII  FILE  ON  RDOS  FILE  SlRUCTUlU 


PROGRAM  NAME: 

DATAMAKE 

LANGUAGE: 

FORT 

CATEGORY 

GENERAL 

SWITCH  TYPE  PURPOSE 


I 

0 

D 


L INPUT  INSTRUCTION  FILE 

L OUTPUT  INSTRUCTION  FILE 

L DATA  FILE 


PURPOSE 
TO  MAKE  A NEW 
ANY  SYSTEM 


DATA  FILE  FOR  THE  SYSTEMTIC  TESTING  OF 

\ 


MIS  PAGE  IS  BEST  QUALITY  PRACTICABLE 
7SOU  COPY  FURfllSHEL  TO  DDC  — 
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PROGRAM  NAME. 
LANGUAGE 
OAT EGOR V 


DATASTART 

FORT 

GENERAL 


PURPOSE 

INTERACTIVE  PROGRAM  FOR  CREATINC  CONTROL.  FILE  FOR  DATAMAKE. 


PROGRAM  NAME:  DFDP 

LANGUAGE  FORT 

CATEGORY  GENERAL 


SWITCH  TYPt  PURPOSE 


R A INPUT  DATA  FILE  (OPTIONAL  > 

R A OUTPUT  FILTER  COEFFICIENTS 

R A MAGNITUDE  SPECTRUM  (OPTIONAL 

R A PHASE  SPECTRUM  (OPTIONAL) 


PURPOSE 

DESIGNS  DIGITAL  FILTERS 


PROGRAM  NAME 

DFI) 

language: 

FOR  1 

(.  a 1 LOOK  Y 

GENERAL 

AN  1 1 C H 

TYPE 

PURPOSE 

1 

R 

A 

INPUT  DATA  FU.E  (OPTIONAL) 

."i 

R 

A 

OUTPUT  FILTER  COEFFICIENTS 

i-l 

R 

A 

MAGNITUDE  SPECTRUM  (OPTIONAL 

H 

R 

A 

PHASE  SPECTRUM  (OPTIONAL) 

PURPOSE 

i/L  SIGNS  DIGITAL  FILTERS 


PROGRAM  NAME  DOWN 

! ANGOAGF.  FORT 

SPEECH 


XH1S  FAQ® 
tsqom  OOi  ® 


mCtXCABa 


CAISGOR'' 


PURPOSE 

TO  DROP  LOWER  ORDER  BITS,  AND/OR  DROP  EVERY  OTHER  OR  2 OUT  0 
OR  BITS  jo  REDUCE  SAMPLING  FREQUENCY 


PROGRAM  NAME  FILTER 

LANGUAGE:  FORT 

CATEGORY:  SPEECH 


SWITCH  TYPE  PURPOSE 


I 

R 

D 


L INPUT  FILE 

L RESULT  FILE 

L DATA  FILE 


PURPOSE  -I 

GENERAL  CANOICAL  FORM  DIGITAL  FILTER  PROGRAM  4 


PROGRAM  1 

NAME. 

FNORM 

LANGUAGE 

FORT 

CATEGORY 

SPEECH 

SWITCH 

TYPE 

PURPOSE 

: 

I 

L 

INPUT  FILE 

R 

L. 

RESULT  FILE 

D 

L 

DATA  FILE 

PURPOSE 

TO  NOMALIZE  A 

'BOATING  POINT  FILE 

PROGRAM  NAME  FFILTER 

LANGUAGE:  FORT 

CATEGORY:  SPEECH 


SWITCH  TYPE  PURPOSE 


I 

R 

D 


L INPUT  FILE 

L RESULT 

L DATA  FILE 


« 
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PURPOSE 

FORGRQUND  VERSION  OF  FILTER 


PROGRAM  NAME: 

LANGUAGE: 

CATEGORY. 


FILMPY 

FORT 

GENERAL 


SWITCH  TYPE  PURPOSE 


0 

M 

P 


R A OUTPUT  FILTER  COFFF 

R A MAGNITUDE  SPECTRUM  (OPTIONAL) 

R A PHASE  SPECTRUM  (OPTIONAL) 


PURPOSE 

PUTS  TOGETHER  ANY  NUMBER  OF  DIQITAL  FILTERS  TO  MAKE 

ONE  FILTER  (CASCADE)  INPUT  FILTER  FILES  HAVE  NO  SWITHCES. 


PROGRAM  NAME. 

LANGUAGE 

CATEGORY 


FILPLT 

FORT 

GENERAL 


PURPOSE 

F-SWAP  PROGRAM  FOR  DFDP 


PROGRAM  NAME. 

LANGUAGE 

CATEGORY 


GOGO 

FORT 

SPEECH 


PURPOSE 

TO  INITIALIZE  THE  CLOCK  AND  A/D  CHANNEL 


P iCKJRAM  NAME  HEAR 

LANGUAGE  ASM 

CATEGORY  SPEECH 


tOISBET9U«-Ti;m«ICASIJi 
QPK  FUKNISH£B  IV  DDC  — '' 


SWITCH  TYPE  PURPOSE 

* L SEE  * 


PURPOSE 

TG  SAMPLE  INPUT  ANALOGUE  DATA 

* SWITCH  DETERMINES  SIZE  OF  SAMPLE  INCYLINDERS 
A- 1 , R=2,  ETC. 


PROGRAM  NAME: 

HLPD 

LANGUAGE: 

FORT 

CATEGORY: 

SPEECH 

SWITCH 

TYPE 

PURPOSE 

I 

L 

INPUT  SPEECH 

DATA 

P 

L 

OUTPUT  PITCH 

DATA 

D 

L 

DATA  FILE 

L 

L- 

LISTING  FILE 

PURPOSE 

HARD  LIMITED  AUTOCORRELATION  PITCH  DETECTOR 


PROGRAM  NAME: 

HIRE 

LANGUAGE: 

FORT 

CATEGORY: 

SPEECH 

SWITCH 

TYPE 

PURPOSE 

I 

I A 

INTEGER 

0 

I A 

INTEGER 

P 

R A 

DATA  FI 

L 

L 

LISTING 

SPEECH  INPUT  FILE 
IMPULSE  RESPONSE  OUTPUT 
E (OPTIONAL) 

(OPTIONAL) 


PURPOSE 

HOMOMORPHIC  IMPULSE  RESPONSE  EXTRACTOR 


PROGRAM  NAME:  LPC 


IHIS  PAGE  IS  3EST  QUALITY  PRACTICABLE 
raOH  COPY  FURNISHED  TO  DDC  , 
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LANGUAGE  FORT 

CATEGORY  SPEECH 


SWITCH  TYPE  PURPOSE 


I 

C 

K 

R 

D 

L 


L INPUT  SPEECH  FILE 

L COEF.  FILE 

L PARCOR  COEF.  FILE 

L AUTO  FILE 

L DATA  FILE 

L LISTING  FILE 


PURPOSE 

BASIC  BLOCK  SYNCHRONOUS  AUTOCORRELATION/TOEPLITZ  VOCODER  TRA 


PROGRAM  NAME: 

L.PR 

LANGUAGE 

FORTRAN 

DATE 

6/  9/77 

AUTHOR 

T P BARNWELL 

CATEGORY: 

SPEECH 

SWITCH 

TYPE 

PURPOSE 

A 

LOCAL 

AREA  FUNCTIONS 

K 

LOCAL 

PARCCR  COEFFICIENTS 

c 

LOCAL 

FFEDBACK  COEFFICIENTS 

0 

LOCAL 

FEEDBAK  COEFFICIENTS 

D 

LOCAL 

BATCH  (DATA)  CONTROL  FILE 

R 

LOCAL 

AUTOCORRELATION  COEFFICIENTS 

P 

LOCAL 

PITCH  FILE 

L 

LOCAL 

LISTING  FILE 

X 

LOCAL 

EXCITATION  OUTPUT  FILE 

PURPOSE 

THIS  IS  A GENERAL  PURPOSE  LPC  RECEIVER  PROGRAM.  IT  RECONFIGU 
ITSELF  DEPENDING  ON  WHAT  FILES  APPEAR  IN  ITS  INPUT  COMMAND 
LINE  IF  ITS  "X"  LINES  ARE  COMPILIED,  THE  PROGRAM  CAN  ADD 
SEVERAL  DISTORTIONS  TO  THE  OUTPUT  SPEECH,  INCLUDING  UNIFORM 
BANDWIDTH  DISTORTION  AND  UNIFORM  FREQUENCY  DISTORTION  IT  MA 
THUS  BE  USED  TO  CORRECT  HELIUM  SPEECH  OR  INSTALL  CONTROLLED 
DISTORTIONS  ON  THE  OUTPUT 


t WO  , RAM  NAME 
LANGUAGE 
! ATF.GURY 


LOOK 

FORT 

GENERAL 


A5E  MM51  (WU.inWWttWO* 

on  FUWN1SH®  TO  DDC  
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SWITCH 


TVPE 


PURPOSE 


I L DATA  FILE 


PURPOSE 

INTERACTIVE  GRAPHICS  INTERPRETER  WHICH  ALLOWS  UP  TO  EIGHT  PI. 
BASED  ON  UP  TO  4 FILES  ON  THE  4010  GRAPHICS  TERMINAL 


PROGRAM  NAME:  MBPD 

LANGUAGE:  FORT 

CATEGORY:  SPEECH 


SWITCH  TYPE  PURPOSE 


A 

B 

C 

2D 

E 

I 

P 

L 


I A UNF1LTERED  SPEECH  INPUT 

I A 50-100HZ  FILTERED  SPEECH 

I A 100-200HZ  FILTERED  SPEECH 

I A 200-400HZ  FILTERED  SPEECH 

I A 400— 800HZ  FILTERED  SPEECH 

1 A DATA  FILE  INPUT  (OPTIONAL) 

R A PITCH  CONTOUR  OUTPUT 

R AVERAGE  LEVEL  INPUT  (FROM  MBPWR) 


PURPOSE 

MULTI  BAND  PITCH  DETECTOR 


PROGRAM  NAME  MBPL.OT 

LANGUAGE:  FORT 

CATEGORY:  SPEECH 


PURPOSE 

' F-SWAP "PROGRAM  FOR  USE  WITH  MBPD 


PROGRAM  NAME:  MBPWR 

LANGUAGE:  FORT 

CATEGORY:  SPEECH 


THIS  PAGE  IS  BEST  QUALITY  PRACTICABI* 
FBOd  COtY  FURNISHED  TO  DD3 - 


SWITCH  TYPE  PURPOSE 

A I A UNFILTERED  SPEECH  INPUT 
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□ 

I 

A 

50-1 OOHZ  FILTERED  SPEECH 

INPUT 

c 

1 

A 

100-200HZ  FILTERED 

SPEECH 

INPUT 

D 

I 

A 

200-400H2  FILTERED 

SPEECH 

INPUT 

E 

I 

A 

400-S00HZ  FILTERED 

SPEECH 

INPUT 

0 

I 

A 

LEVEL  OUTPUT  FILE 

PURPOSE 

AVERAGE  MAGNITUDE  LEVEL  FOR  MBPD 


WS  PA®  IS  BEST  QUALITY  PRACTlOAfifig 
flU  QCXPX  fjuhuisaaD  ro  m>q  , - 


PROGRAM  NAME:  NORM 

LANGUAGE:  FORT 

CATEGORY:  SPEECH 


SWITCH  TYPE  PURPOSE 


l 

R 

D 


L INPUT  FILE 

L RESULT  FILE 

L DATA  FILE 


PURPOSE 

TO  NORMALIZE  AN  INTEGER  FILE 


C 

r 


PROGRAM  1 

NAME: 

OBJECTIVE 

C 

LANGUAGE 

FORTRAN 

c 

DATE: 

6/  9/77 

c 

AUTHOR: 

TP  BARNWELL 

c 

CATEGORY 

SPEECH 

c 

c 

c * 

SWITCH 

TYPE 

PURPOSE 

c 

M 

LOCAL 

MASTER  FILE 

c 

S 

LOCAL 

SLAVE  FILE 

c 

D 

LOCAL 

BATCH  (DATA)  FILE 

c 

0 

L 

LOCAL 

LISTING  FILE 

PURPOSE 

TO  COMPUTE  THE  GAIN  WEIGHTED  AND  NON  GAIN  WEIGHTED  SPECTRAL 
C DISTANCE  METRIC  BETWEEN  TWO  SPECTRUM  FILES. THE  SPECTRUM 

C FILES  ARE  NORMALLY  GENERATED  BY  LPC  . PCEP,  HIRE,  OR  SPCANA 


'Hpifcnspp* 


PROGRAM  NAME: 
LANGUAGE : 
DATE: 

AUTH  R: 
CATEGORY: 


0BJ2 
FORTRAN 
6/  9/77 
T.P.  BARNWELL 
SPEECH 


SWITCH  TYPE  PURPOSE 


M 

S 

D 

L 


LOCAL  MASTER  FILE 
LOCAL  SLAVE  FILE 
LOCAL  BATCH  (DATA)  FILE. 
LOCAL  LISTING  FILE 


1 


PURPOSE  lj 

TO  COMPUTE  THE  GAIN  WEIGHTED  AND  NON  GAIN  WEIGHTED  NON--SFE.C  ! R,\L.  ■ 

DISTANCE  METRIC  BETWEEN  TWO  SPECTRUM  FILES  THE  NON-SPECTRUM 
FILES  ARE  NORMALLY  GENERATED  BY  LPC  , PCEP,  HIRE,  OR  SPCANA 


3 


PROGRAM  NAME: 
LANGUAGE : 
DATE: 

AUTHOR : 
CATEGORY: 


PCEP 
FORTRAN 
6/  9/77 
T.  P.  BARNWELL 
SPEECH 


r 

f 

s 

i 


SWITCH  TYPE  PURPOSE 


D 

A 

M 

S 

B 

L 

W 

Z 


LOCAL  BATCH  (DATA)  CONTROL  FILE 
LOCAL  OUTPUT  CEPSTRUM  FROM  A 
LOCAL  MASTER  INPUT 
LOCAL  SLAVE  INPUT(B) 

LOCAL  OUTPUT  CEPSTRUM  FROM  B 
LOCAL  LISTING  FILE 

LOCAL  INPUT  (ASCII)  WINDOW  (FIR  FILTER)  FUNCTION 
LOCAL  BINARY  POINT  BY  POINT  METRIC 


PURPOSE 

THIS  IS  A GENERAL  PURPOS  CEPSTRAL  COMPARE  PROGRAM  IT  ALLOWS 
USER  TO  COMPARE  ANY  REGION  OF  THE  OPOSING  CEPSTRUMS  AFTER  AN 
WINDOW  FUNCTION  HAS  BEEN  APPLIED.  THIS  ALLOWS  THE  PROGRAM  TO 
USED  FOR  BOTH  SPECTRAL  ENVE'LOP  AND  EXCITATION  COMPARISONS 


PROGRAM  NAME: 

LANGUAGE 

DATE: 

AUTHOR 

CATEGORY: 


PDISTORT 
FORTRAN 
6/  9/77 
T P.  BARNWELL 
SPEECH 


THIS  PAGE  IS  BEST  QUALITY  FRACIICABLt 
FROM  COt'i  FURNISHED  TO  DDC  
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PURPOSE 

THIS  PROGRAM  IS  USED  TO  SYSTEMATICALLY  DISTORT  PITCH  CONTOUR 
THE  DISTORTION  IS  A CONSTANT  RISE  OR  FALL  IN  THE  PITCH  PERIO 
THE  DISTORTION  ONLY  OCCURES  IN  VOICED  SEGMENTS,  AND  THE  PROG 
IS  INTERACTIVE. 


PROGRAM  NAME: 

LANGUAGE: 

CATEGORY: 


PTGTC 

FORT 

SPEECH 


SWITCH 


TYPE 


PURPOSE 


PITCH  FILE 

INPUT  SPEECH 

INPUT  FILTERED  SPEECH 


PURPOSE 

TO  HAND  PAINT  A PITCH  CONTOUR  FOR  TESTING 


i 


PROGRAM  NAME: 

LANGUAGE: 

CATEGORY: 


PCHECH 

FORT 

SPEECH 


SWITCH 


TYPE 


PURPOSE 


INPUT  STATISTICS  FILE 
OUTPUT  STATISTICS  FILE 
DATA  FILE 

ADD  ON  HIST  I GRAM  IN 
ADD  ON  HISTOGRAM  OUT 
LISTING 


PURPOSE 

TO  CHECK  THE  OUTPUT  OF  A PITCH  PERIOD  ESTIMATOR  AGAINST 
A HAN  PAINTER  PITCH  CONTOUR 


PROGRAM  NAME 

LANGUAGE: 

CATEGORY 


PRNT 

FORT 

GENERAL 


- SEST  SUHLtTI  re 

JE55£»u>»«”«  " 


PURPOSE 

TO  PRINT  A PROGRAM  WITH  FILE  NAME  AND  DATE 


PROGRAM  NAME; 

LANGUAGE: 

CATEGORY; 

SCALE 

FORT 

SPEECH 

SWITCH 

TYPE 

PURPOSE 

I 

L 

INPUT  FILE 

R 

L 

RESULT  FILE 

PURPOSE 

TO  SCALE 

A DATA 

FILE  FOR  FILTER 

PROGRAM 

NAME: 

SF 

LANGUAGE 

FORT 

CATEGORY 

GENERAL 

SWITCH 

TYPE 

PURPOSE 

I 

L 

INPUT  FILE 

D 

L 

DATA  FILE 

R 

L 

RESULT  FILE 

C 

L 

COEF.  FILE 

PURPOSE 

TIME  VARYING 

DIGITAL  FILTER  PRO 

PROGRAM  NAME 
LANGUAGE : 

DATE. 

AUTHOR : 

CATEGORY: 

SPCANA 

FORTRAN 

6/  9/77 

T.  P.  BARNWELL 
SPEECH 

THIS 

Ttm 

PAaEisBEs«»tttnWCiic**s» 
OCtfY  FUKJilShJw;  10  Wv  

SWITCH  TYPE 

PURPOSE 

I LOCAL 

INPUT  FILE 
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0 LOCAL  OUTPUT  SPECTRUM 

D LOCAL  BATCH  (DATA)  CONTROL  FILE 

L LOCAL  LOG  SPECTRUM  OUTPUT  FILE 


PURPOSE 

THIS  IS  A GENERAL  PURPOSE  SPECTRUM  ANALYSIS  PROORAM  DESIGNED 
TO  DO  CEPSTRUM  OR  LPC  DECONVOLVED  SPECTRUM. 


PROGRAM  NAME:  2CPD 

LANGUAGE:  FORT 

CATEGORY:  SPEECH 


SWITCH  TYPE  PURPOSE 


I 

v 

1) 


L INPUT  FILE 

L OUTPUT  PITCH  CONTOUR 

L DATA  FILE  (OPTIONAL 


PURPOSE 

ZERO  CROSSING  PITCH  DETECTOR 


JHIS  PAGE  IS  BEST  QUALITY  PRACTICAL 
0Q£ Y * UflWSli®  TO  LD.C 
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